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Abstract 

As  biomedical  research  advances  into  more  complicated  systems,  there  is  an  in¬ 
creasing  need  to  model  and  analyze  these  systems  to  better  understand  them.  Formal 
specification  and  analyzing  methods,  such  as  model  checking  techniques,  hold  great 
promise  in  helping  further  discovery  and  innovation  for  complicated  biochemical  sys¬ 
tems.  Models  can  be  tested  and  adapted  inexpensively  in-silico  providing  new  insights. 
However,  development  of  accurate  and  efficient  modeling  methodologies  and  analysis 
techniques  are  still  open  challenges  for  biochemical  systems.  This  thesis  is  focused 
on  designing  appropriate  modeling  formalisms  and  efficient  analyzing  algorithms  for 
various  biological  systems  in  three  different  thrusts: 

•  Modeling  Formalisms:  we  have  designed  a  multi-scale  hybrid  rule -based  mod¬ 
eling  formalism  to  depict  intra-  and  intercellular  dynamics  using  discrete  and 
continuous  variables  respectively.  Its  hybrid  characteristic  inherits  advantages  of 
logic  and  kinetic  modeling  approaches. 

•  Formal  Analyzing  Algorithms:  1)  We  have  developed  a  LTL  model  checking 
algorithm  for  Qualitative  Networks  (QNs).  It  considers  the  unique  feature  of 
QNs  and  combines  it  with  over- approximation  to  compute  decreasing  sequences 
of  reachability  set,  resulting  in  a  more  scalable  method.  2)  We  have  developed  a 
formal  analyzing  method  to  handle  probabilistic  bounded  reachability  problems 
for  two  kinds  of  stochastic  hybrid  systems  considering  uncertainty  parameters 
and  probabilistic  jumps.  It  combines  a  SMT-based  model  checking  technique 
with  statistical  tests  in  a  sound  manner.  Compared  to  standard  simulation-based 
methods,  it  supports  non-deterministic  branching,  increases  the  coverage  of  sim¬ 
ulation,  and  avoids  the  zero-crossing  problem.  3)  We  have  designed  a  new  frame¬ 
work,  where  formal  methods  and  machine  learning  techniques  take  joint  efforts 
to  enhance  the  understanding  of  biological  and  biomedical  systems.  Within  this 
framework,  statistical  model  checking  is  used  as  a  (sub)model  selection  method. 


•  Applications:  To  check  the  feasibility  of  our  model  language  and  algorithms,  we 
have  1)  constructed  Boolean  Network  models  for  the  signaling  network  for  sin¬ 
gle  pancreatic  cancer  cell,  and  used  symbolical  model  checking  to  analyze  these 
models,  2)  built  Qualitative  Network  models  describing  cellular  interactions  dur¬ 
ing  skin  cells’  differentiation,  and  applied  our  improved  bounded  LTL  model 
checking  technique,  3)  developed  a  multi-scale  hybrid  rule-based  model  for  the 
pancreatic  cancer  micro-environment,  and  employed  statistical  model  checking, 
4)  created  a  nonlinear  hybrid  model  to  depict  a  bacteria-killing  process,  and 
adopted  a  recently  promoted  (i -complete  decision  procedure-based  model  check¬ 
ing  technique,  5)  extended  hybrid  models  for  atrial  fibrillation,  prostate  cancer 
treatment,  and  our  bacteria-killing  process  into  stochastic  hybrid  models,  and  ap¬ 
plied  our  probabilistic  bounded  reachability  analyzer  SReach,  and  6)  carried  out 
the  probabilistic  reachability  analysis  of  the  tap  withdrawal  circuit  in  C.  elegans 
using  SReach. 
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Chapter  1 
Introduction 


Systems  biology  studies  systems  of  biological  components,  which  may  be  molecules,  cells,  organ¬ 
isms  or  entire  species.  It  aims  to  better  understand  the  properties  of  individual  parts  within  com¬ 
plex  living  systems  as  well  as  the  dynamics  of  entire  systems.  To  achieve  this,  given  quantitative 
measurements  of  the  behavior  of  groups  of  interacting  elements,  mathematical  and  computational 
models  are  constructed  to  reproduce  and  predict  dynamical  behaviors.  For  decades,  biologists  have 
been  using  diagrammatic  models  to  describe  and  understand  the  mechanisms  and  dynamics  behind 
their  experimental  observations.  Although  these  models  are  simple  to  be  built  and  understood,  they 
can  only  offer  a  rather  static  picture  of  the  corresponding  biological  systems,  and  scalability  is  lim¬ 
ited.  Then,  models  expressed  mathematically  (e.g.  using  differential  equations)  have  occupied 
the  leading  position.  System  biologists  simply  translate  such  a  model  into  a  computer  program 
simulating  that  model.  As  the  collaboration  between  biologists  and  computer  scientists  becomes 
tighter,  researchers  have  realized  that  biological  systems  and  (distributed)  computer  systems  share 
a  lot  of  features.  That  is,  similar  to  (distributed)  computer  systems,  biological  systems  are  consist 
of  various  components  that  communicate  with  each  other  and  thus  influence  each  other’s  behav¬ 
ior.  This  led  to  an  increasing  interest  for  system  biologists  and  computer  scientists  in  borrowing 
existing  formal  specification  and  analysis  techniques  that  were  designed  for  computer  systems  and 
in  developing  biological  domain-specific  methods,  and  thus  to  the  success  of  application  of  these 
techniques  to  biomedical  systems. 

As  shown  in  Figure  [ITT]  with  formal  executable  models  and  well-founded  analysis  methods  for 
them,  it  offers  an  excellent  means  to  present  knowledge  about  biological  systems,  and  to  reason 
about  these  systems  rigorously.  Moreover,  traditional  in-vivo  and  in-vitro  experiments  are  usually 
expensive,  and  need  to  take  an  awfully  long  time.  While,  the  execution  of  formal  models,  provid¬ 
ing  in  silico  numerical  evaluation  of  hypotheses,  only  takes  comparatively  little  time  and  effort. 
Especially,  when  considering  different  experimental  configurations,  multiple  wet-lab  experiments 
need  to  be  carried  out  repeatedly.  Whereas,  for  formal  models,  only  trivial  modifications  on  the 
initial  assignment  of  system  variables  and  parameters  are  required. 

In  the  following  part  of  this  chapter,  we  will  first  review  formal  modeling  formalisms  that  have 
been  successfully  applied  to  biological  and  biomedical  systems.  We  will  discuss  the  main  ideas 


1 


Experimental  Biology 


Formal  Specification  and  Analysis  Methods 

Figure  1.1:  Experimental  biology  itself  is  an  iterative  process  of  hypothesis -driven  experimentation 
of  a  specific  biological  system.  We  can  boost  this  process  by  using  formal  executable  models, 
and  formal  analysis  methods,  such  as  model  checking,  to  provide  interesting  new  hypotheses  for 
biological  experimental  design. 
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and  important  features  of  five  specification  languages  that  have  fallen  on  fertile  ground  in  systems 
biology.  (See  [25,  8.8 . 11551  for  reviews  of  more  classes  of  formal  models  that  have  been  used  in 
systems  biology.)  Then,  we  will  briefly  go  over  primary  model  checking  techniques,  and  discuss 
its  advantages  over  the  mathematical  analysis  and  simulation-based  methods. 


1.1  Modeling  Formalisms  for  Biological  Systems 

Boolean  Networks 

Boolean  networks  (BNs),  as  one  of  the  most  widely  used  formal  models,  were  first  introduced  by 
Kauffman  [|142fl  in  1969,  where  BNs  were  used  to  model  gene  regulatory  networks.  A  BN  is  a 
directed  graph  containing  a  set  of  nodes.  Nodes  are  defined  as  Boolean  variables,  whose  values 
represent  the  dynamic  activity  and  behavior  of  the  involved  elements  (e.g.,  genes  or  proteins).  At 
each  time  step,  the  next  value  of  a  variable  is  determined  by  a  Boolean  function  of  its  regulators. 
The  values  of  all  variables  form  a  global  state  to  be  updated  synchronously.  In  this  way,  the 
execution  of  a  BN  illuminates  the  causal  and  temporal  relationships  between  the  involved  elements. 

The  main  advantage  of  this  modeling  language  is  that,  even  with  a  strongly  simplified  view  of 
biological  networks,  it  can  still  capture  the  network  structure  and  dynamics,  and  offer  biologically 
meaningful  predictions  and  insights.  Also,  using  such  a  high  abstraction,  it  is  possible  to  model 
interactions  among  large  numbers  of  elements  and  perform  model  validation  and  model-based 
prediction.  Besides  being  applied  to  analyze  the  robustness  and  stability  of  genetic  regulatory 
networks  lfT2l  1721  IT52ll.  BNs  have  also  been  used  to  study  cell  signaling  networks  and  understand 
their  impacts  on  distinct  cell  states  [100,1101.1106111261.  Moreover,  BNs  can  be  inferred  directly 
from  experimental  time-series  data  ifTOl  17411901118611. 

BNs  use  a  coarse  approximation  where  the  status  of  each  modeled  element  as  either  active  (on) 
or  inactive  (off)  by  neglecting  intermediate  states.  In  real- word  biological  systems,  some  elements 
may  have  multiple  states.  The  difference  between  this  binary  assumption  and  biological  reality 
led  researchers  to  suggest  extensions  of  BNs,  such  as  Qualitative  Networks  (QNs)  H 19011.  Gene 
Regulatory  Networks  H 16811.  and  the  logical  model  considering  the  time  delay  mechanism  THiMED 
1116511.  In  QNs,  each  variable  can  have  one  of  a  small  discrete  number  of  values.  Dependencies 
between  variables  become  algebraic  functions  instead  of  Boolean  functions.  Dynamically,  a  state 
of  the  model  corresponds  to  a  valuation  of  variables  and  changes  in  values  of  variables  occur 
gradually  based  on  these  algebraic  functions.  QNs  have  been  shown  to  be  a  suitable  formalism 
to  model  some  biological  systems  lf3Tl  j5Tl  1901.  For  the  logical  model  using  THiMED,  system 
elements  are  modeled  by  multi-valued  variables.  The  timing  details  that  capture  relative  delays 
between  events  are  allowed,  and  implemented  by  truth  tables.  Another  type  of  extensions  of  BNs 
copes  with  the  inherent  noise  and  the  uncertainty  in  biological  processes,  such  as  Boolean  networks 
with  noise  El  and  Probabilistic  Boolean  networks  [|193l.  These  modeling  formalisms  allow  one 
to  consider  the  uncertainty  in  the  knowledge  of  signaling  networks  as  well  as  stochasticity  in 
biological  systems. 
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Petri  Nets 


Petri  nets  lll72il  were  created  by  Carl  Adam  Petri  in  1962  to  describe  chemical  processes,  and 
then  were  also  intensively  employed  in  computer  science  to  model  and  analyze  concurrent  and 
distributed  systems.  A  Petri  net  is  a  graph  with  two  types  of  nodes  -  places  and  transitions,  which 
are  connected  by  directed  arcs.  Places  represent  the  resources  of  the  system;  transitions  indicates 
the  events  that  can  change  the  state  of  the  resources;  and  directed  arcs,  connecting  places  to  tran¬ 
sitions  and  transitions  to  places,  describe  which  places  are  pre  and/or  post-conditions  for  which 
transitions.  The  data,  in  Petri  nets,  are  represented  as  so-called  tokens.  The  state  of  the  system  is 
represented  by  places  holding  tokens.  Note  that,  one  place  may  hold  multiple  tokens.  Given  a  start 
configuration  of  a  Petri  net,  which  assigns  tokens  to  each  place,  transitions  change  the  state  of  the 
system  by  moving  tokens  along  edges.  For  each  transition,  tokens  are  consumed  from  the  input 
place  through  the  transition  and  then  created  in  the  output  place(s).  A  transition  fires  whenever 
it  is  enabled  by  the  presence  of  some  tokens  in  one  of  the  places  directly  connected  to  it.  In  a 
given  state  of  the  system,  there  may  be  more  than  one  transition  that  can  move  a  token,  so  that  the 
execution  of  a  Petri  net  is  non-deterministic. 

Petri  nets  allow  for  concurrency  and  nondeterminism,  and  provide  a  natural  framework  in 
which  both  qualitative  (given  by  the  static  structural  topology  of  the  Petri  nets)  and  quantitative 
(given  by  the  time  evolution  of  the  token  distribution)  analysis  are  tightly  integrated.  Thus,  this 
modeling  language  is  more  general  than  BNs,  and  holds  a  good  balance  between  modeling  power 
and  analyzability.  Petri  nets  are  well-suited  for  modeling  the  concurrent  behavior  of  biochemical 
networks  such  as  genetic  regulatory  pathways.  In  detail,  the  places  in  a  Petri  net  can  represent 
genes,  protein  species  and  complexes;  transitions  represent  reactions  or  transfer  of  a  signal;  di¬ 
rected  arcs  represent  reaction  substrates  and  products;  and  a  transition  firing  is  execution  of  a  reac¬ 
tion  where  substrates  are  consumed  and  products  are  created.  They  have  been  used  to  describe  the 
concurrent  behavior  of  biochemical  networks,  including  metabolic  pathways  and  protein  synthesis 
(5512961. 

Another  good  thing  about  Petri  nets  is  that,  there  are  several  successful  extensions  forming  a 
very  versatile  framework  providing  additional  possibilities  in  modeling  and  analysis.  For  instance, 
in  timed  Petri  nets,  transitions  can  be  timed,  which  allow  for  modeling  the  timing  of  the  system 
as  well.  They  have  been  used  to  model  and  analyze  signal  transductions  in  an  apoptosis  pathway 
(57l.  In  colored  Petri  nets,  tokens  with  different  colors  denote  multiple  possible  values  for  each 
place,  and  thus  allow  for  distinct  activation  levels  to  be  assigned  to  resources.  They  have  been 
used  to  analyze  metabolic  pathways  (96l.  In  stochastic  Petri  nets,  probabilities  have  been  added 
to  the  different  choices  of  the  transitions  to  consider  the  uncertainty  of  biomedical  systems.  They 
have  been  used  to  analyze  signaling  pathways,  where  the  number  of  molecules  of  a  given  type  is 
represented  by  the  color  of  a  place  and  probabilities  represent  reaction  rates  1 103.  1151. 

Rule-based  Modeling 

The  combinatorial  explosion,  which  emerges  from  the  complexity  of  multi-protein  assemblies, 
poses  a  major  barrier  to  the  development  of  detailed,  mechanistic  models  of  biological  systems. 
Modeling  approaches,  such  as  differential  equations,  that  need  manually  enumerating  all  potential 
species  and  reactions  in  a  network  are  impractical.  To  alleviate  the  problem,  rule-based  modeling 
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languages,  such  as  the  BioNetGen  language  (BNGL)  [f83ll  and  Kappa  IfTTI  have  been  developed. 
To  address  the  combinatorial  complexity  in  biochemical  systems,  the  key  idea  of  the  rule-based 
languages  is  to  represent  interacting  molecules  as  structured  objects  and  to  use  pattern-based  rules 
to  encode  their  interactions.  So  that,  a  rich  yet  concise  description  of  signaling  proteins  and  their 
interactions  can  be  provided.  In  other  words,  rule -based  modeling  specifies  only  those  components 
of  a  biological  macromolecule  that  are  directly  involved  in  a  biochemical  transformation.  Also,  the 
reaction  rules  are  defined  as  transformations  of  classes  of  species,  avoiding  the  need  for  specifying 
one  reaction  per  each  possible  state  of  a  species. 

Due  to  the  similarity  to  the  chemical  reaction  representation  widely  used  in  systems  biology, 
rule-based  languages  have  harvested  a  lot  of  attention  among  biologists.  It  has  been  applied  in  the 
modeling  of  different  cell  signaling  pathways  and  networks  [26l  [27.  35]. 

Considering  that  these  rule -based  languages  were  designed  for  describing  molecular  level  dy¬ 
namics,  one  growing  need  is  to  extend  them  to  span  multiple  biological  levels  of  organization. 
ML-Rules  Ill60l  is  a  multi-level  rule-based  language,  which  can  consider  multiple  biological  levels 
by  allowing  objects  to  be  able  to  contain  collections  of  other  objects.  This  embedding  relationship 
can  affect  the  behavior  of  both  container  and  contents,  and  allows  users  to  describe  both  inter- 
and  intra-cellular  processes.  Another  extension  of  the  BNGL  to  enable  the  formal  specification  of 
not  only  the  signaling  network  within  a  single  cell,  but  also  interactions  among  multiple  cells  is 
proposed  in  112 191.  Unlike  ML-Rules  using  continuous  rate  equations  to  capture  the  dynamics  of 
intracellular  reactions,  this  multiscale  language  models  intracellular  dynamics  using  BNs,  which 
reduces  the  difficulty  of  estimating  the  values  of  hundreds  of  unknown  parameters  often  involved 
in  large  models.  This  has  been  used  to  capture  the  intra-  and  inter-  cellular  dynamics  involved  in 
the  pancreatic  cancer  microenvironment  H219M . 

The  other  increasing  need  is  to  take  the  spatial  information  into  consideration  when  carrying 
out  the  cell  biological  modeling.  SRSim  H 10511.  as  one  spatial  extension  of  the  BNGL,  integrates  the 
BNGL  with  a  three-dimensional  coarse-grained  simulation  building  upon  the  LAMMPS  molecular 
dynamics  simulator  H177H.  SRSim  fills  a  gap  located  in  between  the  fine-grained  MD  simulation 
models,  which  do  not  allow  for  the  formulation  of  reaction  networks,  and  2-D  or  3-D  graph  draw¬ 
ing  software  tools,  which  do  not  include  any  possibility  for  dynamic  simulation.  SRSim  has  been 
used  to  model  and  analyze  the  human  mitotic  kinetochore  lll28ll.  Another  spatial  extension  of  the 
BNGL  is  cBNGL  llllll.  in  which  structures  and  rules  are  associated  with  the  concept  of  compart¬ 
ments  and  membranes.  That  is,  cBNGL  distinguishes  between  three-dimensional  (compartment 
volume)  and  two-dimensional  (surface)  compartments.  ML-Space  lf34ll.  as  a  spatial  variant  of  ML- 
Rules,  considers  compartmental  dynamics,  mesh-based  approaches,  and  individuals  moving  in  the 
continuous  space.  In  ML-Space,  species  can  be  defined  as  individual  particles  that  react  due  to 
collisions,  or  as  a  population  of  species  residing  in  a  small  area.  It  has  been  used  to  study  the 
dynamics  of  lipid  rafts  and  their  role  in  receptor  co-localization. 

Hybrid  Systems 

Hybrid  systems  [fl4l  are  formal  models  that  combine  continuous  and  discrete  dynamics  in  a  piece- 
wise  manner.  In  detail,  the  state  space  of  a  hybrid  system  is  defined  by  a  finite  set  of  discrete 
modes.  In  each  mode,  the  system  evolves  continuously  obeying  processes,  generally  ordinary 


5 


differential  equations  (ODEs)  [|68ll .  Transition  conditions  control  the  switch  from  one  mode  to 
another,  which  can  be  followed  by  a  ‘reset’  of  the  involved  continuous  variables.  In  general,  the 
temporal  dynamics  of  a  hybrid  system  is  piecewise  continuous. 

By  using  ODEs,  one  of  the  most  powerful  techniques  in  modeling  system  dynamics,  hybrid  sys¬ 
tems  aim  to  bridge  the  gap  between  mathematical  models  and  computational  models  by  combining 
the  two.  The  continuous  part  of  hybrid  systems,  which  are  captured  by  differential  equations,  bears 
the  closest  relationship  to  the  underlying  biochemical  rate  laws,  thus  can  accurately  model  complex 
biological  systems.  While,  the  discrete  part  of  such  models  is  the  executable  control  mechanism 
that  drives  a  hybrid  system. 

Hybrid  systems  are  particularly  suitable  to  model  biological  systems  that  exhibit  clear  switch¬ 
ing  characteristics  over  time  (that  is,  the  same  system  variables  need  to  be  regulated  by  different 
processes  in  distinct  discrete  states),  such  as  the  cell  cycle.  They  have  been  successfully  used  to 
describe  biological  systems  at  distinct  levels,  including  genetic  regulatory  networks  fl2TTl .  cell  sig¬ 
naling  pathways  [f98ll.  the  cell  cycle  control  [f56il.  the  cardiac  cell  [122811.  bacteria-killing  procedures 
II217H.  and  human  ventricular  action  potentials  in  tissue  [[491 . 

Stochastic  Hybrid  Models 

Stochastic  hybrid  systems  (SHSs)  are  a  class  of  dynamical  systems  that  involve  the  interaction  of 
discrete,  continuous,  and  stochastic  dynamics.  Due  to  that  generality,  SHSs  have  been  widely  used 
in  systems  biology,  such  as  modeling  subtilin  production  in  bacillus  subtilis  H1251.  and  person¬ 
alized  prostate  cancer  treatment  Il2 1 8ll .  To  describe  stochastic  dynamics,  uncertainties  have  been 
added  to  hybrid  systems  in  various  ways.  A  wealth  of  models  has  been  promoted  over  the  last 
decade. 

One  way  expresses  random  initial  values  and  stochastic  dynamical  coefficients  using  random 
variables,  resulting  in  hybrid  automata  (HAs)  with  parametric  uncertainty  [12 1 811 .  When  modeling 
real-world  biological  systems  using  hybrid  models,  parametric  uncertainty  arises  naturally.  Al¬ 
though  its  cause  is  multifaceted,  two  factors  are  critical.  First,  probabilistic  parameters  are  needed 
when  the  physics  controlling  the  system  is  known,  but  some  parameters  are  either  not  known 
precisely,  are  expected  to  vary  because  of  individual  differences,  or  may  change  by  the  end  of 
the  system’s  operational  lifetime.  Second,  system  uncertainty  may  occur  when  the  model  is  con¬ 
structed  directly  from  experimental  data.  Due  to  imprecise  experimental  measurements,  the  values 
of  system  parameters  may  have  ranges  of  variation  with  some  associated  likelihood  of  occurrence. 

Another  class  of  models  integrates  deterministic  flows  with  probabilistic  jumps.  When  state 
changes  forced  by  continuous  dynamics  involve  discrete  random  events,  we  refer  to  such  sys¬ 
tems  as  probabilistic  hybrid  automata  (PHAs)  Il200l.  PHAs  extend  HAs  with  discrete  probabil¬ 
ity  distributions.  More  precisely,  for  discrete  transitions  in  a  model,  instead  of  making  a  purely 
(non)deterministic  choice  over  the  set  of  currently  enabled  jumps,  a  PHA  (non)deterministically 
chooses  among  the  set  of  recently  enabled  discrete  probability  distributions,  each  of  which  is  de¬ 
fined  over  a  set  of  transitions.  Although  randomness  only  influences  the  discrete  dynamics  of  the 
model,  PHAs  are  still  very  useful  and  have  interesting  practical  applications  112011.  One  interesting 
variation  of  PHAs  H218B  allows  additional  randomness  for  both  transition  probabilities  and  resets 
of  system  variables.  In  other  words,  in  terms  of  the  additional  randomness  for  jump  probabilities, 


6 


for  the  probabilities  attached  to  probabilistic  jumps  from  one  mode,  instead  of  having  a  discrete 
distribution  with  predefined  constant  probabilities,  they  can  be  expressed  by  equations  involving 
random  variables  whose  distributions  can  be  either  discrete  or  continuous.  This  extension  is  moti¬ 
vated  by  the  fact  that  some  transition  probabilities  can  vary  due  to  factors  such  as  individual  and 
environmental  differences  in  real-world  systems.  When  it  comes  to  the  randomness  of  variable  re¬ 
sets,  a  system  variable  can  be  reset  to  a  value  obtained  according  to  a  known  discrete  or  continuous 
distribution,  instead  of  being  assigned  a  fixed  value.  When  continuous  probabilistic  events  are  also 
involved,  we  call  them  stochastic  hybrid  automata  (SHAs)  li92l. 

Other  models  replace  deterministic  flows  with  stochastic  ones,  such  as  stochastic  differential 
equations  (SDEs)  lH8l  and  stochastic  hybrid  programs  (SHPs)  H1751.  where  the  random  perturba¬ 
tion  affects  the  dynamics  continuously.  When  all  such  ingredients  have  been  covered,  there  are 
models  such  as  the  general  stochastic  hybrid  systems  (GSHSs)  [50,  1241.  In  the  next  section,  we 
will  show  how  to  construct  a  stochastic  hybrid  model  of  the  effect  of  estrogen  at  different  levels  in 
species’  population  change  in  a  fresh  water  ecosystem. 


1.2  Model  Checking 

Model  Checking,  as  a  framework  consisting  of  powerful  techniques  for  verifying  finite-state  sys¬ 
tems,  was  independently  developed  by  Clarke  and  Emerson  |[66l  and  by  Queille  and  Sifakis  lll80ll 
in  the  early  1980’s.  Over  the  last  few  decades,  it  has  been  successfully  applied  to  numerous  theoret¬ 
ical  and  practical  problems  [[52]  EE.  114,  1 1 8L 1 1 58l  121 8l .  such  as  verification  of  sequential  circuit 
designs,  communication  protocols,  software  device  drivers,  security  algorithms,  cyber-physical 
systems,  and  biological  systems.  There  are  several  major  factors  contributing  to  its  success.  Pri¬ 
marily,  Model  Checking  is  fully  automated.  Unlike  deductive  reasoning  using  theorem  provers, 
this  ‘push-button’  method  neither  requires  proofs  nor  experts  to  check  whether  a  finite-state  model 
satisfies  given  system  specifications.  Besides  verification  of  correctness,  it  permits  bug  detection 
as  well.  If  a  property  does  not  hold,  a  model  checker  can  return  a  diagnostic  counterexample  denot¬ 
ing  an  actual  execution  of  the  given  system  model  leading  to  an  error  state.  Such  counterexamples 
can  then  help  detect  subtle  bugs.  Finally,  from  a  practical  aspect,  Model  Checking  also  works 
with  partial  specifications,  which  allows  the  separation  of  system  design  and  development  from 
verification  and  debugging. 

Typically,  a  model  checker  has  three  basic  components:  a  modeling  formalism  adopted  to 
encode  a  state  machine  representing  the  system  to  be  verified,  a  specification  language  based  on 
Temporal  Logics  H1781.  and  a  verification  algorithm  which  employs  an  exhaustive  searching  of  the 
entire  state  space  to  determine  whether  the  specification  holds  or  not.  Because  of  the  exhaustive 
search,  when  being  applied  to  complex  systems,  all  model  checkers  face  an  unavoidable  problem 
in  the  worst  case.  The  number  of  global  states  of  a  complex  system  can  be  enormous.  Given 
n  processes,  each  having  m  states,  their  asynchronous  composition  may  have  mn  states  which 
is  exponential  in  both  the  number  of  processes  and  the  number  of  states  per  process.  In  Model 
Checking,  we  refer  to  this  as  the  State  Explosion  Problem.  Great  strides  have  been  made  on  this 
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problem  over  the  past  32  years  for  various  types  of  real-world  systems.  In  the  following,  we  first 
discuss  major  breakthroughs  that  have  been  made  during  the  development  of  Model  Checking  for 
formal  analysis  of  various  types  of  systems. 

Symbolic  Model  Checking  with  OBDDs 

In  the  original  implementation  of  the  first  model  checking  algorithm  lf66ll.  the  transition  system  has 
an  explicit  representation  using  the  adjacency  lists.  Such  an  enumerative  representation  is  feasible 
for  concurrent  systems  with  small  numbers  of  processes  and  states  per  process,  but  not  adequate 
for  very  large  transition  systems.  In  the  fall  of  1987,  McMillan  made  a  fundamental  breakthrough. 
He  realized  that  by  reformulating  the  original  model  checking  procedure  in  a  symbolic  way  where 
sets  of  states  and  sets  of  transitions  are  represented  rather  than  individual  states  and  transitions, 
Model  Checking  could  be  used  to  verify  larger  systems  with  more  than  1020  states  lf53ll .  The  new 
symbolic  representation  was  based  on  Bryant’s  ordered  binary  decision  diagrams  (OBDDs)  Il46l. 
In  this  symbolic  approach,  the  state  graphs,  which  need  to  be  constructed  in  the  explicit  model 
checking  procedure,  are  described  by  Boolean  formulas  represented  by  OBDDs.  Model  Checking 
algorithms  can  then  work  directly  on  these  OBDDs.  Since  OBDD-based  algorithms  are  set-based, 
they  cannot  directly  implement  the  depth-first  search,  and  thus  the  property  automaton  should  also 
be  represented  symbolically. 

Since  then,  various  refinements  of  the  OBDD-based  algorithms  [36  ,  5 1  .TT0.T9j 1  have  pushed 
the  size  of  state  space  count  up  to  more  than  10120  lf5T]|.  The  most  widely  used  symbolic  model 
checkers  SMV  H1621.  NuSMV  Il59ll.  and  VIS  If39ll  are  based  on  these  ideas. 

Partial  Order  Reduction 

As  mentioned  in  Section  1,  the  size  of  the  parallel  composition  of  n  processes  in  a  concurrent  sys¬ 
tem  may  be  exponential  in  n.  Verifying  a  property  of  such  a  system  requires  inspecting  all  states 
of  the  underlying  transition  system.  That  is,  n\  distinct  orderings  of  the  interleaved  executions  of 
n  states  need  to  be  considered  in  the  setting  where  there  are  no  synchronizations  between  the  indi¬ 
vidual  processes.  This  is  even  more  serious  for  software  verification  than  for  hardware  verification, 
as  software  tends  to  be  less  structured  than  hardware.  One  of  the  most  successful  techniques  for 
dealing  with  asynchronous  systems  is  partial  order  reduction.  Since  the  effect  of  concurrent  ac¬ 
tions  is  often  independent  of  their  ordering,  this  method  aims  at  decreasing  the  number  of  possible 
orderings,  and  thus  reducing  the  state  space  of  the  transition  system  that  needs  to  be  analyzed  for 
checking  properties.  Intuitively,  if  executing  two  events  in  either  order  results  in  the  same  result, 
they  are  independent  of  each  other.  In  this  case,  it  is  possible  to  avoid  exploring  certain  paths  in 
the  state  transition  system. 

Partial  order  reduction  crucially  relies  on  two  assumptions.  One  is  that  all  processes  are  fully 
asynchronous.  The  other  is  that  the  property  to  be  checked  does  not  involve  the  intermediate  states. 
When  coping  with  realistic  systems  where  the  processes  may  communicate  and  thus  depend  on  one 
another,  this  approach  attempts  to  identify  path  fragments  of  the  full  transition  system,  which  only 
differ  in  the  order  of  the  concurrently  executed  activities.  In  this  way,  the  analysis  of  state  space 
can  be  restricted  to  one  (or  a  few)  representatives  of  every  possible  interleaving. 

Godefroid,  Peled,  and  Valmari  have  developed  the  concepts  of  incorporating  partial  order  re¬ 
duction  with  Model  Checking  independently  in  the  early  1990’s.  Valmari’s  stubborn  sets  H209H. 
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Godefroid’s  persistent  sets  Il99ll.  and  Peled’s  ample  sets  [|17 111  differ  on  the  actual  details  but  contain 
many  similar  ideas.  The  SPIN  model  checker,  developed  by  Holzmann  H1231.  uses  the  ample-set 
reduction  to  great  advantage. 

Bounded  Model  Checking 

Although  Symbolic  Model  Checking  (SMC)  with  OBDDs  has  successfully  improved  the  scala¬ 
bility  and  is  still  widely  used,  OBDDs  have  multiple  problems  which  restrict  the  size  of  models 
that  can  be  checked  with  this  method.  Since  the  ordering  of  variables  has  to  be  identical  for  each 
path  from  the  root  of  an  OBDD  to  a  leaf  node,  finding  a  space-efficient  ordering  is  critical  for  this 
technique.  Unfortunately,  it  is  quite  difficult,  sometimes  impossible,  to  find  an  order  resulting  in 
a  small  OBDD.  Consider  the  formula  for  the  middle  output  bit  of  a  combinational  multiplier  for 
two  n-bit  numbers.  It  can  be  proved  that,  for  all  variable  orderings,  the  size  of  the  OBDD  for  this 
formula  is  exponential  in  n. 

To  further  conquer  the  state  explosion  problem,  Biere  et  al.  proposed  the  Bounded  Model 
Checking  (BMC)  using  Boolean  satisfiability  (SAT)  solvers  [|33ll.  The  basic  idea  for  BMC  is  quite 
straightforward.  Given  a  finite-state  transition  system,  a  temporal  logic  property  and  a  bound  k 
(we  assume  k  >  1),  BMC  generates  a  propositional  logical  formula  whose  satisfiability  implies 
the  existence  of  a  counterexample  of  length  k,  and  then  passes  this  formula  to  a  SAT  solver.  This 
formula  encodes  the  constraints  on  initial  states,  the  transition  relations  for  k  steps,  and  the  negation 
of  the  given  property.  When  the  formula  is  unsatisfiable  (no  counterexample  found),  we  can  either 
increase  the  bound  k  until  either  a  counterexample  is  found,  or  k  reaches  the  upper  bound  on  how 
much  the  transition  relation  would  need  to  be  unwound  for  the  completeness,  or  stop  if  resource 
constraints  are  exceeded.  As  an  industrial-strength  model  checking  technique,  BMC  has  been 
observed  to  surpass  SMC  with  OBDDs  in  fast  detection  of  counterexamples  of  minimal  length,  in 
saving  memory,  and  by  avoiding  performing  costly  dynamic  reordering.  With  a  fast  SAT  solver, 
BMC  can  handle  designs  that  are  order-of-magnitude  larger  than  those  handled  by  OBDD-based 
model  checkers. 

As  an  efficient  way  of  detecting  subtle  counterexamples,  BMC  is  quite  useful  in  debugging. 
In  order  to  prove  correctness  when  no  counterexamples  are  found  using  BMC,  an  upper  bound  on 
steps  to  reach  all  reachable  states  needs  to  be  determined.  It  has  been  shown  that  the  diameter  (i.e., 
the  longest  shortest  path  between  any  two  states)  of  the  state-transition  system  could  be  used  as  an 
upper  bound  lf33Tl.  But,  it  appears  to  be  computationally  difficult  to  compute  the  diameter  when 
the  state-transition  system  is  given  implicitly.  Other  ways  for  making  BMC  complete  are  based  on 
induction  lll92l.  cube  enlargement  H161I.  Craig  interpolants  H163I.  and  circuit  co-factoring  ll93l. 
This  problem  remains  a  topic  of  active  research. 

An  interesting  variation  of  the  original  BMC  is  to  adopt  a  Satisfiability  Modulo  Theories  (SMT) 
solver  instead  of  a  SAT  solver  [7011205H.  SMT  encodings  in  model  checking  have  several  advan¬ 
tages.  The  SMT  encodings  offers  more  powerful  specification  language.  They  use  (unquantified) 
first-order  formulas  instead  of  Boolean  formulas,  and  use  more  natural  and  compact  encodings,  as 
there  is  no  need  to  convert  high  level  constraints  into  Boolean  logic  formulas.  These  SMT  encod¬ 
ings  also  make  the  BMC  work  the  same  for  finite  and  infinite  state  systems.  Above  all,  high  level 
of  automation  has  not  been  sacrificed  for  the  above  advantages.  CBMC  is  a  widely  used  Bounded 
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model  checker  for  ANSI-C  and  C++  programs  Ill49l  ,  having  supports  for  SMT  solvers  such  as  Z3 
ll73Tl.  and  Yices  Il79l. 

Counterexample- Guided  Abstraction  Refinement 

When  the  model  state  space  is  enormous,  or  even  infinite,  it  is  infeasible  to  conduct  an  exhaus¬ 
tive  search  of  the  entire  space.  Another  method  of  coping  with  the  state  explosion  problem  is  to 
abstract  away  irrelevant  details,  according  to  the  property  under  consideration,  from  the  concrete 
state  transition  system  when  constructing  the  model.  We  call  this  approach  abstraction.  This  sim¬ 
plification  incurs  information  loss.  Depending  on  the  method  used  to  control  the  information  loss, 
abstraction  techniques  can  be  distinguished  into  either  over-approximation  or  under- approximation 
techniques.  The  over-approximation  methods  enrich  the  behavior  of  the  system  by  releasing  con¬ 
straints.  They  establish  a  relationship  between  the  abstract  model  and  the  original  system  so  that 
the  correctness  of  the  former  implies  the  correctness  of  the  latter.  The  downside  is  that  they  ad¬ 
mit  false  negatives,  where  there  are  properties  which  hold  in  the  original  system  but  fail  in  the 
abstract  model.  Therefore,  a  counterexample  found  in  the  abstract  system  may  not  be  a  feasible 
execution  in  the  original  system.  These  counterexamples  are  called  spurious.  Conversely,  the 
under-approximation  techniques,  which  admit  false  positives,  obtain  the  abstraction  by  removing 
irrelevant  behavior  from  the  system  so  that  a  specification  violation  at  the  abstract  level  implies  a 
violation  of  the  original  system. 

The  counterexample-guided  abstraction  refinement  (CEGAR)  technique  ||63l  integrates  an  over¬ 
approximation  technique  -  existential  abstraction  If67]|  -  and  SMC  into  a  unified,  and  automatic 
framework.  It  starts  verification  against  universal  properties  with  an  imprecise  abstraction,  and 
iteratively  refines  it  according  to  the  returned  spurious  counterexamples.  When  a  counterexample 
is  found,  its  feasibility  with  regard  to  the  original  system  needs  to  be  checked  first.  If  the  violation 
is  feasible,  this  counterexample  is  reported  as  a  witness  for  a  bug.  Otherwise,  a  proof  of  infeasi¬ 
bility  is  used  to  refine  the  abstraction.  The  procedure  then  repeats  these  steps  until  either  a  real 
counterexample  is  reported,  or  there  is  no  new  counterexamples  returned.  When  the  property  holds 
on  the  abstract  model,  by  the  Property  Preservation  Theorem  Ii67l.  it  is  guaranteed  for  the  property 
to  be  correct  in  the  concrete  systems.  CEGAR  is  used  in  many  software  model  checkers  including 
the  SLAM  project  ll22ll  at  Microsoft. 

Model  Checking  for  Stochastic  Hybrid  Systems 

The  popularity  of  SHSs  in  real-world  applications  plays  an  important  role  as  the  motivation  for 
putting  a  significant  research  effort  into  the  foundations,  analysis  and  control  methods  for  this 
class  of  systems.  Among  various  problems,  one  of  the  elementary  questions  for  the  quantitative 
analysis  of  SHSs  is  the  probabilistic  reachability  problem.  There  are  two  main  reasons  why  it 
catches  researchers’  attention.  Primarily,  it  is  motivated  by  the  fact  that  most  temporal  properties 
can  be  reduced  to  reachability  problems  due  to  the  very  expressive  hybrid  modeling  framework. 
Moreover,  probabilistic  state  reachability  is  a  hard  and  challenging  problem  which  is  undecidable 
in  general.  Intuitively,  this  class  of  problems  is  to  compute  the  probability  of  reaching  a  certain  set 
of  states.  The  set  may  represent  a  set  of  certain  unsafe  states  which  should  be  avoided  or  visited 
only  with  some  small  probability,  or  dually,  a  set  of  good  states  which  should  be  visited  frequently. 

Over  the  last  decade,  research  efforts  concerning  SHSs  are  rapidly  increasing.  At  the  same 
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time,  Model  Checking  methods  and  tools  for  probabilistic  systems,  such  as  PRISM  111501.  MRMC 
cm,  and  Ymer  [2301.  have  been  proposed  and  designed.  Results  related  to  the  analysis  and 
verification  of  SHSs  are  still  limited.  For  instance,  analysis  approaches  for  GSHSs  are  often  based 
on  Monte-Carlo  simulation  |32l  182].  Considering  the  hardness  of  dealing  with  the  general  class, 
efforts  have  been  mainly  placed  on  different  subclasses  HI  13  EDI  HH 11081 11751 12001 12181 12321 
|23l. 

For  a  decidable  subclass  which  is  called  probabilistic  initialized  rectangular  automata  (PI- 
RAs),  Sproston  offered  a  model  checking  procedure  against  the  probabilistic  branching  time  logic 
(PBTL)  [l200ll .  The  procedure  first  translates  PIRA  to  a  probabilistic  timed  automaton  (PTA),  then 
constructs  a  finite-state  probabilistic  region  graph  for  the  PTA,  and  employs  existing  PBTL  Model 
Checking  techniques.  For  probabilistic  rectangular  automata  (PRAs)  which  are  less  restricted  than 
PIRAs,  Sproston  proposed  a  semi-decidable  model  checking  procedure  via  using  a  forward  search 
through  the  reachable  state  space  H2011I. 

For  a  more  expressive  class  of  models  -  probabilistic  hybrid  automata  (PHAs),  Zhang  et  al. 
abstracted  the  original  PHA  to  a  probabilistic  automaton  (PA),  and  then  used  the  established  Model 
Checking  methods  for  the  abstracting  model  ll232ll.  Hahn  et  al.  also  discussed  an  abstraction-based 
method  where  the  given  PHA  was  translated  into  a  n-player  stochastic  game  using  two  different 
abstraction  techniques  111081.  All  abstractions  obtained  by  these  methods  are  over-approximations, 
which  means  that  the  estimated  maximum  probability  for  a  safety  property  on  the  abstracted  model 
is  no  less  than  the  one  on  the  original  model.  Another  method  proposed  is  a  SMT-based  bounded 
Model  Checking  procedure  lf9Tl . 

A  similar  class  of  models,  which  is  widely  used  in  the  control  theory,  is  called  discrete-time 
stochastic  hybrid  systems  (DTSHSs)  [fT51l.  Akin  to  PHAs,  DTSHSs  comprise  nondeterministic 
as  well  as  discrete  probabilistic  choices  of  state  transitions.  Unlike  PHAs,  DTSHSs  are  sampled 
at  discrete  time  points,  use  control  inputs  to  model  nondeterminism,  do  not  have  an  explicit  no¬ 
tion  of  symbolic  transition  guards,  and  support  a  more  general  concept  of  randomness  which  can 
describe  discretized  stochastic  differential  equations.  With  regard  to  the  system  analysis,  the  con¬ 
trol  problem  concerned  can  be  understood  as  to  find  an  optimal  control  policy  that  minimizes  the 
probability  of  reaching  unsafe  states.  A  backward  recursive  procedure,  which  uses  dynamic  pro¬ 
gramming,  was  then  proposed  to  solve  the  problem  0  |T5  ]-  Another  approach  to  a  very  similar 
problem  as  above,  where  a  DTSHS  model  doesn’t  have  nondeterministic  control  inputs,  was  pre¬ 
sented  in  Compared  to  former  method,  the  latter  approach  exploits  the  grid  to  construct  a 
discrete-time  Markov  chain  (DTMC),  and  then  employs  standard  model  checking  procedures  for 
it.  This  approach  then  had  been  used  in  [|9]|  as  an  analysis  procedure  for  the  probabilistic  reacha¬ 
bility  problems  in  the  product  of  a  DTSHS  and  a  Biichi  automaton  representing  a  linear  temporal 
property.  Zuliani  et  al.  also  mentioned  a  simulation-based  method  for  model  checking  DTSHSs 
against  bounded  temporal  properties  1123411 .  We  refer  to  this  method  as  Statistical  Model  Check¬ 
ing  (StatMC).  The  main  idea  of  StatMC  is  to  generate  enough  simulations  of  the  system,  record 
the  checking  result  returned  from  a  trace  checker  from  each  simulation,  and  then  use  statistical 
testing  and  estimation  methods  to  determine,  with  a  predefined  degree  of  confidence,  whether  the 
system  satisfies  the  property.  Although  this  statistical  model  checking  procedure  does  not  belong 
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to  the  class  of  exhaustive  state-space  exploration  methods,  it  usually  returns  results  faster  than  the 
exhaustive  search  with  a  predefined  arbitrarily  small  error  bound  on  the  estimated  probability. 

In  Il92l,  as  an  extension  of  PHAs,  stochastic  hybrid  automata  (SHAs)  allow  continuous  prob¬ 
ability  distributions  in  the  discrete  state  transitions.  With  respect  to  the  verification  procedure,  a 
given  SHA  is  firstly  over- approximated  by  a  PHA  via  discretizing  continuous  distributions  into 
discrete  ones  with  the  help  of  additional  uncountable  nondeterminism.  As  mentioned,  this  over¬ 
approximation  preserves  safety  properties.  For  the  second  step,  the  verification  procedure  intro¬ 
duced  in  [|232l  is  exploited  to  model  check  the  over- approximating  PHA. 

Another  interesting  work  is  about  stochastic  hybrid  programs  (SHPs)  introduced  in  H1751I.  This 
formalism  is  quite  expressive  with  regard  to  randomness:  it  takes  stochastic  differential  equations, 
discrete  probabilistic  branching,  and  random  assignments  to  real- valued  variables  into  account.  To 
specify  system  properties,  Platzer  proposed  a  logic  called  stochastic  differential  dynamic  logic, 
and  then  suggested  a  proof  calculus  to  verify  logical  properties  of  SHPs. 


1.3  Overview  of  Contributions 

In  this  thesis,  we  have  been  focusing  on  designing  appropriate  modeling  formalisms  and  efficient 
analyzing  algorithms  for  various  biological  systems  in  three  different  thrusts: 


•  Modeling  Formalisms:  We  design  a  multi-scale  hybrid  rule -based  modeling  formalism,  ex¬ 
tended  from  the  traditional  rule-based  language  -  BioNetGen.  This  new  language  is  able  to 
describe  the  intracellular  reactions  and  intercellular  interactions  simultaneously.  Further¬ 
more,  to  depict  intracellular  reactions,  its  hybrid  characteristic  asks  for  less  information 
about  model  parameters,  such  as  reaction  rates,  than  traditional  rule-based  languages.  In 
a  nutshell,  our  language  can  describe  both  discrete  and  continuous  models  using  a  unified 
rule -based  representation.  This  results  in  a  modeling  framework  that  combines  the  advan¬ 
tages  of  logic  and  kinetic  modeling  approaches.  Details  can  be  found  in  Chapter  6. 

•  Formal  Analysis  Algorithms: 

-  We  develop  a  model  checking  algorithm  for  Qualitative  Networks  (QNs),  a  formalism 
for  modeling  signal  transduction  networks  in  biology.  One  of  the  unique  features  of 
qualitative  networks,  due  to  their  lacking  initial  states,  is  that  of  “reducing  reachability 
sets”.  Our  method  considers  this  unique  features  of  QNs  and  combines  it  with  over¬ 
approximation  to  compute  decreasing  sequences  of  reachability  set  for  QN  models, 
which  results  in  a  more  scalable  model  checking  algorithm  for  QNs.  Details  can  be 
found  in  Chapter  3. 

-  We  propose  a  formal  analyzing  method  to  handle  probabilistic  bounded  reachability 
problems  for  two  kinds  of  stochastic  hybrid  systems  -  general  hybrid  systems  with 
parametric  uncertainty  and  probabilistic  hybrid  automata  with  additional  randomness. 
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Standard  approaches  to  reachability  problems  for  linear  hybrid  systems  require  nu¬ 
merical  solutions  for  large  optimization  problems,  and  become  infeasible  for  systems 
involving  both  nonlinear  dynamics  over  the  reals  and  stochasticity.  Our  approach  com¬ 
bines  a  SMT-based  model  checking  technique  with  statistical  tests  in  a  sound  man¬ 
ner.  Compared  to  standard  simulation-based  methods,  it  supports  non-deterministic 
branching,  increases  the  coverage  of  simulation,  and  avoids  the  zero-crossing  problem. 
Details  can  be  found  in  Chapter  5. 

-  We  design  a  framework,  where  formal  methods  and  machine  learning  techniques  take 
joint  efforts  to  automate  the  model  construction,  analysis,  and  refinement  of  biological 
and  biomedical  systems.  The  creation  of  models  most  often  relies  on  intense  human 
effort.  That  is,  model  developers  have  to  read  hundreds  of  published  papers  and  con¬ 
duct  numerous  discussions  with  experts  to  understand  the  behavior  of  the  system  and 
to  construct  the  model.  This  laborious  process  results  in  slow  development  of  models, 
let  alone  validating  the  model  and  extending  it  with  thousands  of  other  possible  com¬ 
ponents  that  already  exist  in  published  literature.  Meanwhile,  research  results  are  pub¬ 
lished  at  a  high  rate,  and  the  published  literature  is  voluminous,  but  often  fragmented, 
and  sometimes  even  inconsistent.  Our  framework  offers  the  automation  of  information 
extraction  from  literature,  smart  assembly  into  models,  and  model  analysis,  to  enable 
researchers  to  re-use  and  reason  about  previously  published  work,  in  a  comprehensive 
and  timely  manner.  Details  can  be  found  in  Chapter  7. 

•  Modeling  and  Applications: 

-  To  investigate  whether  and  how  formal  modeling  and  analysis  methods  can  contribute 
to  the  study  of  biological  systems,  we  construct  a  Boolean  Network  model  for  the  sig¬ 
naling  network  for  a  single  pancreatic  cancer  cell.  Important  system  dynamics  with 
respect  to  cell  fate,  cell  cycle,  and  oscillating  behaviors  are  formulated  into  CTL  for¬ 
mulas.  Then,  we  used  an  existing  symbolic  model  checker  NuSMV  to  check  against 
these  CTL  properties,  and  confirmed  experimental  observations  and  thus  validated  our 
model.  Details  can  be  found  in  Chapter  2. 

-  To  show  the  speedup  offered  by  our  improved  bounded  LTL  model  checking  technique 
for  QNs,  we  build  several  QN  models  describing  the  cellular  interactions  during  the  de¬ 
velopment  of  the  skin  differentiation.  By  comparing  our  method  with  an  existing  model 
checking  technique  for  QNs,  we  showed  that  our  method  offered  a  significant  accel¬ 
eration  especially  when  analyzing  large  and  complex  models.  Details  can  be  found  in 
Chapter  3. 

-  We  create  a  nonlinear  hybrid  model  to  depict  a  light-aided  bacteria-killing  process. 
Then,  by  using  a  recently  promoted  5-complete  decision  procedure-based  model  check¬ 
ing  technique,  we  found  that  1)  the  earlier  we  turn  on  the  light  after  adding  IPTG,  the 
quicker  bacteria  cells  can  be  killed;  2)  in  order  to  kill  bacteria  cells,  the  light  has  to  be 
turned  on  for  at  least  4  time  units;  3)  the  time  difference  between  removing  the  light 
and  removing  IPTG  has  insignificant  impact  on  the  cell  killing  outcome;  and  4)  the 
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range  of  the  necessary  concentration  of  SOX  to  kill  bacteria  cells  can  be  broader  than 
the  range  indicated  by  our  collaborating  biologist.  Details  can  be  found  in  Chapter  4. 

We  extend  hybrid  models  for  atrial  fibrillation,  prostate  cancer  treatment,  and  our 
bacteria-killing  process  into  stochastic  hybrid  models.  We,  then,  apply  our  probabilistic 
bounded  reachability  analyzer  SReach  to  demonstrate  its  feasibility  in  model  falsifica¬ 
tion,  parameter  estimation,  and  sensitivity  analysis.  We  also  use  SReach  to  perform  the 
bounded-time  reachability  analysis  on  the  Tap  Withdrawal  circuit  model  of  C.  elegans 
to  estimate  the  probability  of  various  TW  responses  related  to  parameter  uncertainty, 
and  thus  to  derive  population  percentages  that  exhibit  various  behaviors  in  response  to 
tap  stimuli.  It  shows  that  SReach  can  handle  large-scale  systems  for  which  traditional 
reachability  analysis  may  not  scale.  Details  can  be  found  in  Chapter  5.4. 

To  study  the  mechanism  underlying  the  tumor  micro-environment  during  the  develop¬ 
ment  of  the  pancreatic  cancer,  we  develop  a  multi- scale  hybrid  rule-based  model  for 
the  pancreatic  cancer  micro-environment,  and  employ  statistical  model  checking  to  an¬ 
alyze  it.  The  formal  analysis  results  showed  that  our  model  could  reproduce  existing 
experimental  findings  with  regard  to  the  mutual  promotion  between  pancreatic  cancer 
and  stellate  cells.  The  results  also  explained  how  treatments  latching  onto  different  tar¬ 
gets  resulted  in  distinct  outcomes.  We  then  used  our  model  to  predict  possible  targets 
for  drug  discovery.  Details  can  be  found  in  Chapter  6. 
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Chapter  2 

Pancreatic  Cancer  Single  Cell  Model  as 
Boolean  Network  and  Symbolic  Model 
Checking 


Signal  transduction  is  a  process  for  cellular  communication  where  the  cell  receives  (and  responds 
to)  external  stimuli  from  other  cells  and  from  the  environment.  It  affects  most  of  the  basic  cell 
control  mechanisms  such  as  differentiation  and  apoptosis.  The  transduction  process  begins  with 
the  binding  of  an  extracellular  signaling  molecule  to  a  cell-surface  receptor.  The  signal  is  then 
propagated  and  amplified  inside  the  cell  through  signaling  cascades  that  involve  a  series  of  trigger 
reactions  such  as  protein  phosphorylation.  The  output  of  these  cascades  is  connected  to  gene 
regulation  in  order  to  control  cell  function.  Signal  transduction  pathways  are  able  to  crosstalk, 
forming  complex  signaling  networks. 

In  this  chapter,  we  have  investigated  the  functionality  of  six  signaling  pathways  that  have  been 
shown  to  be  100%  genetically  mutated  during  the  progression  of  pancreatic  cancer  H133H.  within  a 
pancreatic  cancer  cell,  and  constructed  a  in-silico  Boolean  network  model  considering  the  crosstalk 
among  them  [  100,  1011. 

Pancreatic  cancer  (PC)  is  a  highly  aggressive  malignancy  and  the  4th  leading  cause  of  cancer- 
related  death  in  the  United  States  0.  It  arises  from  intraepithelial  neoplasia  (PanIN),  a  progression 
of  lesions  that  occur  in  the  pancreatic  ducts.  It  is  characterized  by  a  propensity  for  early  local  and 
distant  invasion  -  rapid  growth,  early  metastasis  -  and  an  unresponsiveness  to  most  conventional 
treatments  -  it  is  highly  resistant  to  chemotherapy  and  radiation.  Vogelstein  et  al.  H1331  global 
genomic  analysis  identified  12  cellular  signaling  pathways  that  are  genetically  altered  in  over  67% 
of  pancreatic  cancers.  The  study  also  found  that  PC  contains  an  average  of  63  genetic  alterations, 
and  that  the  KRAS,  apoptosis,  TGF/i,  Hedgehog,  and  Wnt/Notch  signaling  pathways,  and  the 
regulation  of  Gl/S  phase  transition  have  genetic  alterations  in  100%  of  tumors.  A  number  of 
molecular  and  pathological  analyses  of  evolving  pancreatic  adenocarcinoma  revealed  progressive 
genetic  mutations  of  KRAS,  CDKN2A,  TP53,  SMAD4,  corresponding  to  the  mutations  of  the 
KRAS,  INK4a,  ARF,  P53,  and  SMAD4  proteins  in  the  above  mentioned  pathways.  Mutations  of 
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oncoprotein  and  tumor  suppressor  proteins  result  in  uncontrolled  cell  proliferation  and  evasion  of 
apoptosis  (programmed  cell  death),  eventually  leading  to  cancer.  In  addition,  PC  over-expresses 
a  number  of  growth  factors  (GF)  and  their  respective  receptors,  including  the  epidermal  growth 
factor  (EGF),  sonic  hedgehog  (SHH),  WNT,  transforming  growth  factor  (TGF/3  ),  and  Insulin-like 
growth  factor  (IGF1)  or  Insulin.  These  growth  factors  can  stimulate  pancreatic  cancer  cell  growth 
via  autocrine  and/or  paracrine  feedback  loops.  In  our  model,  we  have  considered  three  important 
cell  functions  -  proliferation,  apoptosis,  and  cell  cycle  arrest.  Given  this  model,  we  are  interested 
in  verifying  that  sequences  of  signal  activation  will  drive  the  network  to  a  pre-specified  state  within 
a  pre-specified  time.  Thus,  we  have  applied  symbolic  model  checking  (SMC)  to  it,  and  shown  that 
its  behaviors  are  qualitatively  consistent  with  experiments.  We  have  demonstrated  that  SMC  offers 
a  powerful  approach  for  studying  logical  models  of  relevant  biological  processes. 


2.1  Pancreatic  Cancer  Cell  Model 

Genomic  analyses  1113311  have  identified  six  cellular  signaling  pathways  that  are  genetically  altered 
in  100%  of  pancreatic  cancers:  the  KRAS,  Hedgehog,  Wnt/Notch,  Apoptosis,  TGF/3,  and  regula¬ 
tion  of  Gl/S  phase  transition  signaling  pathways.  Also,  many  in  vitro  and  in  vivo  experiments  with 
pancreatic  cancer  cells  have  found  that  several  growth  factors  and  cytokines  including  IGF/Insulin, 
EGF,  Hedgehog,  WNT,  Notch  ligands,  HMGB1,  TGF/3,  and  oncoprotein  including  RAS,  NFkB. 
and  SMAD7  are  overexpressed  [|T9l.  We  performed  an  extensive  literature  search  and  constructed 
a  signaling  network  model  composed  by  the  EGF-PI3K-P53,  Insulin/IGF-KRAS-ERK,  SHH-GLI, 
HMGB 1  -NFkB.  RB  -  E2F,  WNT/3  -  Catenin,  Notch,  TGF/3  -  SMAD,  and  Apoptosis  pathway.  Our 
aim  is  to  study  the  interplay  between  tumor  growth,  cell  cycle  arrest,  and  apoptosis  in  the  pancre¬ 
atic  cancer  cell.  In  Figure  |2.1|  we  depict  the  crosstalk  model  of  different  signaling  pathways  in 
the  pancreatic  cancer  cell.  Here,  we  will  first  iterate  these  pathways,  and  focus  on  their  association 
with  apoptosis,  cell  cycle  arrest  and  tumor  proliferation.  In  the  following,  the  symbol  — *  means 
activation  (or  overexpression),  while  the  symbol  H  denotes  inhibition  (or  deactivation). 

Insulin/IGF-KRAS-ERK  pathway 

Insulin/IGF  — >•  IR  — >■  KRAS  — >  RAF  — >-  MEK  — >  ERK  — s-  AP1,MYC.  The  overexpressed  growth 
factors,  including  Insulin-like  growth  factor  (IGF)  and  Insulin,  could  activate  the  KRAS  protein, 
resulting  in  the  phosphorylation  of  its  downstream  proteins  RAF,  MEK,  and  ERK  [1751.  These  can 
phosphorylate  or  activate  the  transcription  factors  API  and  MYC  to  activate  the  expression  of  the 
cell  cycle  regulatory  protein  Cyclin  D,  enabling  progression  of  the  cell  cycle  through  the  G1  phase. 
KRAS  is  mutated  in  over  90%  of  pancreatic  cancers  ll23l.  This  pathway  could  also  upregulate  the 
expression  level  of  GFI  in  the  sonic  hedgehog  pathway  [|133il . 

EGF-PI3K-P53  pathway 

There  are  two  important  downstreaming  pathways:  PI3K  — s-  PIP3  — >  AKT  — >  MDM2  H  P53  — > 
P21,BAX,  and  P53  — >•  PTEN  H  PIP3  — >  AKT  — >  MDM2  H  P53.  PC  overexpresses  a  number  of 
mitogenic  growth  factors  and  receptor  tyrosine  kinase  (RTK),  including  EGF(R),  IGF(R),  which 
can  activate  the  PI3K  pathway  to  promote  the  growth  of  pancreatic  cancer  cells.  The  activation 
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Figure  2.1:  Schematic  view  of  signal  transduction  in  the  pancreatic  cancer  model.  Blue  nodes 
represent  tumor-suppressor  proteins,  red  nodes  represent  oncoproteins/lipids.  Arrow  represents 
protein  activation,  circle-headed  arrow  represents  deactivation. 
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of  PI3K  initiates  a  cascade  of  reactions  including  the  phosphorylation  of  PIP2,  AKT  and  MDM2, 
leading  to  the  inhibition  of  P53s  transcription  activity  in  the  nucleus  1 1131.  The  tumor-suppressor 
protein  P53,  expressed  in  the  later  stage  of  PanIN,  is  mutated  in  more  than  50%  of  pancreatic 
adenocarcinomas  [|23l.  Also,  P53  is  a  transcription  factor  for  many  tumor-suppressor  proteins 
including  PTEN  and  P21,  which  can  negatively  regulate  the  AKT  pathway,  and  induce  cell  cycle 
arrest,  respectively. 


RB-E2F  pathway 

CyclinD  H  RB  H  E2F  — *  CyclinE.  This  pathway  regulates  the  cell  cycle  progression  from  phase  G1 
to  phase  S,  induced  by  the  Cyclin  E  and  CDK2  complex.  In  the  normal  cell,  the  unphosphorylated 
RB,  a  tumor  suppressor  protein,  binds  to  E2F  and  inhibits  its  transcription  activity.  E2F  will  be 
activated  when  its  inhibitor  RB  is  phosphorylated  and  inhibited  by  CyclinD,  promoting  the  tran¬ 
scription  of  CyclinE  P227L  The  germline  mutations  of  CDKN2A  in  this  pathway,  which  encodes 
the  tumor  suppressors  INK4a  (inhibitor  of  CyclinD-CDK4/6)  and  ARF  (inhibit  MDM2s  activity 
to  stabilize  P53),  were  found  in  up  to  90%  of  pancreatic  cancers  ll23ll. 


SHH-GLI  pathway 

It  is  composed  of  two  main  parts:  1)  SHH  H  PTCH  H  SMO  ->•  GLI  ->  IGF, WNT, CyclinD, PTCH; 
and  2)  AKT  H  PKA  H  GLI.  The  Sonic  hedgehog  (SHH)  protein  and  its  receptor  Smoothened 
(SMO)  are  activated  and  overexpressed  in  later-stage  pancreatic  carcinomas,  and  it  occurs  in  over 
70%  of  PCs  H203I.  In  the  quiescent  cell  without  SHH,  SMO  is  bound  and  inhibited  by  the  tumor 
suppressor  protein  patched  (PTCH).  Once  SHH  binds  to  PTCH,  SMO  is  released  to  activate  the 
glioma-associated  oncogene  homologue  (GLI  1/2/3),  leading  to  an  active  form  of  transcription  fac¬ 
tor.  In  the  absence  of  SHH,  the  protein  kinase  A  (PKA)  and  CKI  (only  PKA  is  shown  in  Fig.  2.1 ) 
transform  GLI  into  a  repressor  form  which  can  inhibit  GLIs  transcriptional  activity.  The  activation 
of  the  SHH-GLI  pathway  is  associated  with  tumor  proliferation  and  pancreatic  cancer-associated 
fibroblasts  Il216ll.  The  expression  of  GLI  could  also  be  up-regulated  by  the  PI3K-AKT  and  KRAS- 
ERK  pathways,  independently  from  SHH  activation.  In  particular,  SHH  signaling  alone  is  suffi¬ 
cient  to  drive  pancreatic  neoplasia,  but  does  not  form  pancreatic  adenocarcinomas  li203l. 


WNT  pathway 

WNT  — >  FZD  — »  DVL  H  GSK3/3  H  /3-Catenin  — >•  TCF  — *  CyclinD.  WNT  pathway  activation  and 
the  overexpression  of  several  pathway  components  were  observed  in  65%  of  pancreatic  adenocar¬ 
cinomas  Il231ft.  When  the  WNT  protein  is  absent,  /7-catcnin  is  localized  in  the  cytoplasm,  bound  to 
and  inhibited  by  the  complexes  composed  of  Axin,  APC,  and  GSK3/3  ll225l.  The  canonical  WNT 
pathway  is  activated  by  the  interaction  of  WNT  and  Frizzled  (FZD)  proteins,  which  can  destabi¬ 
lize  the  Axin-APC-GSK3  complex  and  translocate  /3-catcnin  to  the  nucleus,  where  it  activates  the 
TCF-LEF  transcription  factors  112121. 


Notch  pathway 

DLL  — >•  Notch  — >  NICD  — *  CyclinD.  The  Notch  pathway  is  activated  after  binding  of  transmem¬ 
brane  ligands,  including  DLL  (Delta-like  1,  3,  4)  and  Jagged  1-  2  with  Notch  proteins.  After 
that,  Notch  will  be  cleaved  and  a  Notch  intracellular  domain  (NICD)  will  be  released,  which  will 
translocate  to  the  nucleus  to  induce  the  expression  of  several  target  genes,  including  the  cell  regu¬ 
latory  protein  CyclinD.  Recent  findings  indicate  that  the  Notch  pathway  is  involved  in  the  devel- 
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opment  of  pancreatic  cancer  Ii48fl. 

HMGBI-NFkB  pathway 

signaling  — »  IKK  H  IkB  H  NFkB  — »  A20,IkB,Bc1XL,GLI.  A  recent  study  H136II  has  found  that  the 
overexpression  of  HMGB 1  could  promote  the  growth  of  pancreatic  cancer  cells  by  activating  the 
RAGE  pathway.  In  the  resting  cell,  NFkB  is  located  in  the  cytoplasm,  bound  to  and  inhibited  by 
IkB.  Once  activated  by  HMGB1,  the  IkB  kinase  (IKK)  will  phosphorylate  and  deactivate  Ik  B, 
leading  to  the  translocation  of  NFkB  into  the  nucleus  to  promote  the  transcription  of  a  number  of 
genes,  including  CyclinD,  the  anti-apoptotic  protein  Bcl-XL,  its  inhibitors  A20  and  IkB  [  122,  2101, 
and  HMGB  1  lU37l. 

TGF/3-SMAD  pathway 

it  has  two  main  parts:  1)  TGF/3  ->■  TGFR  ->■  SMAD2/3/4  -)■  P21;  and  2)  TGF/3  ->■  TGFR  -)■ 
PI3K-RAS -pathway.  The  TGF/3-SMAD  signaling  pathway  can  inhibit  the  growth  of  normal  hu¬ 
man  epithelial  cells.  When  the  TGF/3  ligand  binds  to  type  II  TGF/3  recep  tors  (TGFR),  Type  I 
receptors  will  be  activated,  leading  to  the  phosphorylation  of  the  cytoplasmic  SMAD2/3  proteins. 
The  proteins  SMAD2/3  form  a  complex  with  SMAD4,  and  translocate  into  the  nucleus  to  acti¬ 
vate  several  transcription  factors,  upregulating  the  expression  of  cyclin-dependent  kinase  (CDK) 
inhibitors,  including  P21  [76,  80].  SMAD4  was  found  to  be  either  mutated  or  deleted  in  over  50% 
of  pancreatic  cancers  which  occurred  in  the  later-stage  PanINs  111311.  In  addition  to  the  Smad- 
dependent  signaling  pathway,  TGF  also  activates  the  PI3K-RAS  pathway,  leading  to  the  crosstalk 
with  the  WNT  and  EGF  pathways.  Impairment  of  the  TGF/3-SMAD  pathway  promotes  cell  pro¬ 
liferation  and  contributes  to  carcinogenesis. 

Apoptosis  pathway 

P53  — »  BAD,  BAX  ,  Apafl  — >  cytochrome-C  — >  Cas3.  The  apoptosis  pathway  is  regulated  by  both 
the  anti-apoptotic  (BclX)  and  the  pro-apoptotic  Bcl-2  families  of  proteins  ill  871.  The  activation 
of  P53  will  induce  or  upregulate  the  transcription  of  several  pro-apoptotic  pro-  teins  including 
BAX,  BAD,  and  Apafl  (Apoptotic  protease  activating  factor  1).  After  receiving  pro-apoptotic 
signals  from  P53,  BAD  will  inhibit  Bcl-XLs  pro-apoptotic  effects,  while  this  process  is  inhibited 
by  the  pro-survival  signals  from  AKT.  BAX  is  a  protein  of  the  Bcl-2  family  which  can  activate  the 
apoptosis  process  by  promoting  the  release  of  cytochrome  C  (Cyto-C)  from  the  mitochondrion. 
This,  in  turn,  promotes  the  formation  of  the  apoptosome  complex  (APC)  H 1 831  which  contains 
Cyto-C  and  Apafl .  Cas3  is  an  apoptosis  effector  caspase  (cysteine-dependent  aspartate  specific 
proteases)  which  can  cleave  proteins  in  the  execution  phase  of  cell  apoptosis  lll99ll.  The  activation 
of  Cas3  is  promoted  by  APC  and  inhibited  by  the  inhibitors  of  apoptosis  (IAP).  It  has  been  found 
that  Cas3  is  mutated  in  many  cancer  types  [1199B . 


2.2  Boolean  Network 

In  this  Section,  we  translate  the  above  signaling  pathways  into  a  Boolean  network  model.  The  input 
signals  of  the  model  are  different  growth  factors  including  SHH,  EGF,  TGF.  The  output  signals  are 
Apoptosis,  (Cell)  Proliferation,  and  (Cell  Cycle)  Arrest. 
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In  the  Boolean  network  model  of  the  pancreatic  cancer  cell,  each  node  represents  a  protein/lipid 
in  the  signaling  pathway.  At  any  specific  time,  each  node  can  be  in  either  the  ON(l)  or  OFF(O) 
state.  The  state  evolution  of  a  node  from  time  t  to  t  + 1  is  described  by  a  Boolean  transfer  function. 
This  function  will  in  general  depend  on  the  state  of  the  neighbor  nodes.  In  this  paper  we  use 
several  forms  of  transfer  function.  In  one  form,  we  assume  that  a  node  is  activated  (inhibited)  if  its 
incoming  neighbor  is  active  (inhibited).  This  form  is  used,  for  example,  for  receptor  nodes  such 
as  EGFR,  which  are  expressed  only  if  their  upstream  ligand  is  present.  A  dual  form  assumes  that 
a  node  is  activated  (inhibited)  when  its  incoming  neighbor  is  inhibited  (activated).  This  form  is 
used,  e.g.,  for  SMO,  which  is  bound  and  inhibited  by  PTCH  (see  the  description  of  the  SHH-GLI 
pathway  above). 

In  another  form,  we  assume  that  neighboring  nodes  are  classified  as  activators  or  inhibitors. 
Activators  node  can  change  the  state  of  a  node  n  if  and  only  if  no  inhibitor  acting  on  node  n  is 
in  the  ON  state.  Our  assumption  is  motivated  by  the  fact  that  many  tumor-suppressor  proteins 
including  P53,  PTEN,  SMAD4,  INK4a,  and  ARF,  are  either  lost  or  mutated  in  the  early  or  late 
stages  of  PDAC,  while  oncoproteins  such  as  KRAS,  NFkB,  and  GLI,  are  continuously  activated  or 
overexpressed.  The  transfer  function  for  node  n  can  be  written  as 

n(t  +  !)  =  {n{t)  V  \J  a(t)}  A~>(  \/  i(t)),  (2.1) 

a£A(n)  i£l(n) 

where  A(n)  and  I  (n)  are  the  activators  and  inhibitors  of  node  n,  respectively. 

In  our  model,  we  assume  synchronous  state  update  for  all  the  nodes  in  the  network.  That  is,  at 
any  time  step  the  state  of  each  node  in  the  model  is  updated  according  to  its  transfer  function.  In  the 
future  we  plan  to  study  asynchronous  models,  to  take  into  account  the  observation  that  biological 
processes  may  evolve  at  different  speeds.  We  remark  that  our  verification  approach  would  still 
work,  since  Model  Checking  can  cope  with  asynchronous  systems. 

The  Boolean  network  in  Figure[2j]comprises  61  nodes,  including  7  control  (input)  nodes,  and  3 
output  nodes.  We  emphasize  that  the  structure  depicted  in  Figure |2T| is  not  a  state  transition  graph. 
Rather,  it  represents  the  wiring  diagram  of  our  model.  Since  each  node  is  a  Boolean  variable,  the 
state  space  of  the  model  has  cardinality  261.  Is  it  a  correct  model  to  describe  the  proliferation  and 
apoptosis  of  pancreatic  cancer  cell?  To  answer  this  question  we  use  Symbolic  Model  Checking  of 
Computational  Tree  Fogic  (CTF)  properties,  which  we  will  introduce  next. 


2.3  Symbolic  Model  Checking 

Given  our  pancreatic  single  cell  model,  we  express  its  intended  behavior  as  Computation  Tree 
Fogic  (CTF)  11651  formulas.  Then,  we  apply  symbolic  model  checking  against  it.  Here,  we  give  a 
brief  introduction  to  CTF  and  symbolic  model  checking. 

Kripke  structure 

A  finite  state  system  can  be  described  as  a  tuple: 
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M  =<  S,1,K,C  > 


where  S  is  a  finite  set  of  states,  X  C  S  is  the  set  of  initial  states,  and  7Z  C  S  x  S  is  the 
transition  relation,  specifying  the  possible  transitions  from  state  to  state.  £  is  a  function  that  labels 
states  with  the  atomic  propositions  from  a  given  language.  Such  a  tuple  is  called  state  transition 
system  or  Kripke  structure  Ifl48ll. 

Computational  Tree  Logic 

Temporal  logics  are  used  to  predicate  over  the  behavior  defined  by  Kripke  structures.  A  be¬ 
havior  in  a  Kripke  structure  is  obtained  starting  from  a  state  s6l,  and  then  repeatedly  appending 
states  reachable  through  TZ.  We  require  that  the  transition  relation  7 Z  be  total  (that  is,  a  transition 
relation  1Z  C  S  x  S  is  total  if  and  only  if  for  each  state  s  G  S  there  exists  a  state  s'  G  S  such 
that  (s,  s')  e  TZ).  As  a  consequence  all  the  behaviors  of  the  system  are  infinite.  Since  a  state  can 
have  more  than  one  successor,  the  structure  can  be  thought  of  as  unwinding  into  an  infinite  tree, 
representing  all  the  possible  executions  of  the  system  starting  from  the  initial  states. 

Two  useful  temporal  logics  are  Computation  Tree  Logic  (CTL)  and  Linear  Temporal  Logic 
(LTL).  They  differ  in  how  they  handle  branching  in  the  underlying  computation  tree.  In  CTL 
temporal  operators  it  is  possible  to  quantify  over  the  paths  departing  from  a  given  state.  In  LTL 
operators  are  intended  to  describe  properties  of  all  possible  computation  paths.  For  our  pancreatic 
cancer  cell  model,  we  use  CTL  to  describe  the  corresponding  system  properties. 

The  syntax  of  CTL  formulas  is  given  by  the  following  rules: 

•  any  atomic  proposition  is  a  CTL  formula; 

•  if  a  and  (5  are  CTL  formulas,  then  a  •  3  and  -< a  are  CTL  formulas,  where  •  is  any  boolean 
connective  (  A,  V, . . . );  and 

•  if  a  and  (3  are  CTL  formulas,  then  EXa,  EGa,  E[aU/3]  are  CTL  formulas. 

The  intuitive  meaning  of  CTL  formulas: 

•  EXa  means  that  there  exists  (E)  a  path  starting  from  a  state  s0  €  S  in  which  in  the  next  ( X) 
state  a  holds. 

•  EGa  means  that  there  exists  a  path  starting  from  a  state  s0  in  which  globally  (G)  a  holds. 

•  E[aU/3]  there  exists  a  path  starting  from  a  state  s0  in  which  a  holds  until  (U)  A  holds. 

The  other  CTL  operators  (e.g.,  AFa,  meaning  for  all  paths  eventually  a)  can  be  derived  from 
these  three  as  follows. 

•  AXa  =  -iEX(-ia)  means  that,  for  all  paths,  in  the  next  state  a. 
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•  EFn-  =  E[TUa]  means  that  there  exists  a  path  in  which  eventually  a. 

•  AGa  =  -iEF(-ia)  means  invariantly  a. 

•  A[aTJ/3 ]  =  -iE[-i^U-iQ!  A  — </3]  A  -iEG-i/3  means  that,  for  all  paths,  a  until  (3. 

•  AFa  =  A  TUa]  means  that,  for  all  paths,  eventually  a. 

Symbolic  CTL  Model  Checking 

The  Model  Checking  algorithm  applied  to  a  CTL  formula  cj)  works  by  recursively  labeling  the 
state  graph  with  the  sub-formulas  of  0,  and  then  parses  the  graph  to  compute,  for  each  sub-formula, 
its  truth  value  in  a  state  according  to  the  CTL  operators  and  the  truth  values  of  its  subformulas.  In 
the  original  Model  Checking  algorithm,  the  state  transitions  were  represented  explicitly:  this  can 
lead  to  state  explosion.  The  main  idea  behind  symbolic  model  checking  is  to  represent  and  manip¬ 
ulate  a  finite  state-transition  system  symbolically  as  a  Boolean  function,  so  as  to  alleviate  the  state 
explosion  problem.  In  particular,  Ordered  binary  decision  diagrams  (OBDDs)  [[461  are  a  canonical 
form  for  Boolean  formulas.  OBDDs  are  often  substantially  more  compact  than  traditional  normal 
forms.  Moreover,  they  can  be  manipulated  very  efficiently. 

We  consider  Boolean  formulas  over  n  variables  x±,  ■  ■  ■  ,  xn.  A  binary  decision  diagram  (BDD) 
is  a  rooted  directed  acyclic  graph  with  two  types  of  vertices,  terminal  vertices  and  nonterminal 
vertices.  Each  nonterminal  vertex  v  is  labeled  by  a  variable  var(v )  and  has  two  successors,  low(v ) 
and  high(v).  Each  terminal  vertex  v  is  labeled  by  either  0  or  1  via  a  Boolean  function  value{y). 
A  BDD  with  root  v  determines  a  Boolean  function  fv{x i,  •  •  •  ,  xn)  in  the  following  manner. 

•  If  v  is  a  terminal  vertex  then  fv(x i,  •  •  •  ,  xn)  =  value(v). 

•  If  v  is  a  nonterminal  vertex  with  var(v )  =  x%  then  fv(x i,  •  •  •  ,  xn)  is  given  by 


(~ '•t'i  A  flow(y)  (^1  j  ‘  ‘  ‘  %n)  )  V  (X'i  A  fhigh(v)  (A  1 )  ^n)  ) 


In  an  OBDD  there  is  a  strict  total  ordering  of  the  variables  x\,  ■  ■  •  ,  xn  when  traversing  the 
diagram  from  the  root  to  the  terminals.  Given  an  assignment  to  the  variables  x\,  ■  ■  ■  ,  xn,  the  value 
of  the  formula  can  be  decided  by  traversing  the  OBDD  from  the  root  to  the  terminals.  At  each 
node,  branching  is  decided  by  the  value  assigned  to  the  variable  that  labels  the  node. 

There  exist  efficient  algorithms  for  operating  on  OBDDs.  All  sixteen  two-argument  logical 
operations  can  be  implemented  efficiently  on  Boolean  functions  that  are  represented  as  OBDDs. 
In  particular,  the  complexity  of  these  operations  is  linear  in  the  product  of  the  size  of  the  argument 
OBDDs.  The  key  idea  for  efficient  implementation  of  these  operations  is  Shannon  expansion: 
/  =  (-ix  A  f\x=o)  V  (a:  A  f\x=i)-  In  l!46l.  Bryant  gave  a  uniform  procedure  for  computing  all  16 
logical  operations. 

McMillan  developed  the  symbolic  CTL  model  checking  algorithm  using  BDDs  H1621.  This 
algorithm  can  handle  much  larger  concurrent  systems  than  the  explicit- state  model  checking  |[54ll. 
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State  transition  systems  can  be  represented  with  BDDs  as  follows.  First,  we  must  represent  the 
states  in  terms  of  n  Boolean  state  variables  v  =  {t!i,  v2,  •  •  •  ,  vn}.  Then,  we  express  the  transition 
relation  R  as  a  Boolean  formula  in  terms  of  the  state  variables: 

}r{v  i,V2,  ■■■  ,  vn,  v[,  v'2,  -  ■  -  ,  v'J  =  1  iff  R(v  i,  v2,  ■■■  ,vn,  v[,v'2,  ■■■  ,  v'n) 

where  Vi,v2,  ■  ■  ■  ,vn  represent  the  current  state  and  v\ ,  v2,  ■  ■  ■  ,v'n  represent  the  next  state.  Finally, 
we  convert  fR  to  a  BDD. 


2.4  Results  and  Discussion 


We  used  NuSMV  [601,  a  Symbolic  Model  Checker  to  determine  whether  our  in  silico  pancreatic 
cancer  cell  model  satisfies  certain  properties  written  in  a  temporal  logic.  In  our  model,  we  set  the 
initial  values  of  ARF,  INK4a,  and  SMAD4  to  be  OFF  (0),  while  Cyclin  D  is  set  to  be  ON  (1). 
These  choices  are  motivated  by  the  following  observations.  According  to  the  genetic  progression 
model  of  pancreatic  adenocarcinoma,  the  malignant  transformation  from  normal  duct  to  pancreatic 
adenocarcinomas  requires  multiple  genetic  alterations  in  the  progression  of  neoplastic  growth, 
represented  by  Pancreatic  intraepithelial  neoplasia  (PanINs)lA/B,  PanIN-2,  PanIN-3  [|23ll.  The 
loss  of  the  functions  of  CDKN2A,  which  encodes  two  tumor  suppressors  INK4A  and  ARF,  occurs 
in  80  -  95%  of  sporadic  pancreatic  adenocarcinomas  H1851.  SMAD4  is  a  key  component  in  the 
TGF/i  pathway  which  can  inhibit  most  normal  epithelial  cellular  growth  by  blocking  the  Gl-S 
phase  transition  in  the  cell  cycle;  and  it  is  frequently  lost  or  mutated  in  pancreatic  adenocarcinoma 
||224fl.  Furthermore,  it  has  been  shown  that  the  loss  of  SMAD4  can  predict  decreased  survival  in 
pancreatic  adenocarcinoma  111  1611.  Besides  the  loss  of  many  tumor  suppressors,  the  oncoprotein 
Cyclin  D  is  frequently  overexpressed  in  many  human  pancreatic  endocrine  tumors  |[58l .  As  shown 
in  Table  2.1[  we  divide  the  properties  that  have  been  considered  into  three  categories,  according  to 
their  relationship  with  Cell  Fate,  Cell  Cycle,  and  Oscillations. 

Cell  Fate 

The  first  properties  we  verify  concern  the  pancreatic  cancer  cell’s  fate,  i.e.,  survival  or  death. 
In  our  model,  the  following  two  CTL  properties  are  false, 


AF  Apoptosis,  AF  Arrest 


which  mean  that  the  cell  does  not  necessarily  have  to  undergo  apoptosis,  and  that  the  cell  cycle 
does  not  necessarily  stop.  On  the  other  hand,  the  property 

AF  Proliferate 


is  true,  indicating  that  the  cancer  cell  will  necessarily  proliferate.  Furthermore,  since  the  following 
“steady  state”  property  is  true, 

AF  AG  Proliferate 


23 


property 

verification 

result 

discussion 

Cell  Fate 

AF  Apoptosis  V  AF  Arrest 

False 

the  cell  does  not  necessarily  have  to 
undergo  apoptosis,  and  the  cell  cycle 
does  not  necessarily  stop 

AF  Proliferate 

True 

the  cancer  cell  will  necessarily  proliferate 

AF  AG  Proliferate 

True 

proliferation  is  eventually  both 
unavoidable  and  permanent 

AF  ! Apoptosis  A  AF  ! Arrest 

True 

it  is  always  possible  for  the  cancer  cell  to 
reach  states  in  which  Apoptosis  and 
Arrest  are  OFF,  thereby  making  cell 
proliferation  possible 

AF  (!  Apoptosis  A  !  Arrest  A 
Proliferate) 

False 

the  model  cannot  always  eventually 
reach  a  state  in  which  apoptosis  and  cell 
cycle  arrest  are  not  inhibited  and  cell 
proliferation  is  active 

AF  AG  !  Apoptosis  V 

AF  AG  !  Arrest 

False 

inhibition  of  apoptosis  and  cell  cycle 
arrest  are  not  unavoidable  and  permanent 

Cell  Cycle 

A  (! Proliferate  U  CyclinD ) 

True 

it  is  always  the  case  that  cell  proliferation 
does  not  occur  until  Cyclin  D  is 
expressed  (or  activated) 

AF  AG  CyclinD 

False 

in  our  model  the  activation  of  Cyclin  D  is 
not  a  steady  state 

!E  (!P53  U  Apoptosis) 

False 

apoptosis  can  be  activated  even  when 
P53  is  not 

Oscillations 

TGF/3  -A  AG  ((\NFkB  -a 

AF  NFkB)  A  (NFkB  -> 

AF  \NFkB) 

True 

an  initial  overexpression  of  TGF/3  always 
leads  to  oscillations  in  NFkB’s 
expression  level 

PIP3  -A  AG  ((\NFkB  -a 

AF  NFnB)  A  (NFnB  -A 

AF  \NFkB)) 

True 

PIP3  has  the  similar  impact  on  NFkB’s 
expression  level 

AG  ((P53  -A  AF  M DM2)  A 
(MDM2  -A  AF  !P53)) 

True 

overexpression  of  P53  will  always 
activate  MDM2,  which  will  in  turn 
inhibit  P53 

Table  2. 1 :  Model  checking  results. 


we  know  that  proliferation  is  eventually  both  unavoidable  and  permanent.  We  now  ask  whether 
it  is  always  possible  for  the  cancer  cell  to  reach  states  in  which  Apoptosis  and  Arrest  are  OFF, 
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thereby  making  cell  proliferation  possible.  The  following  two  properties  are  true. 


AF  ! Apoptosis,  AF  ! Arrest 


However,  the  property 


AF  (! Apoptosis  A  ! Arrest  A  Proliferate) 

is  false,  which  means  that  the  model  cannot  always  eventually  reach  a  state  in  which  apoptosis 
and  cell  cycle  arrest  are  not  inhibited  and  cell  proliferation  is  active.  We  also  report  that  the  two 
properties 

AF  AG  ! Apoptosis,  AF  AG  ! Arrest 

are  false,  so  that  inhibition  of  apoptosis  and  cell  cycle  arrest  are  not  unavoidable  and  permanent. 
Cell  Cycle 

We  study  properties  involving  the  cell  cycle,  in  which  the  protein  Cyclin  D  is  a  key  player.  The 
next  property  is  true, 

A  (! Proliferate  U  CyclinD) 

which  means  that  it  is  always  the  case  that  cell  proliferation  does  not  occur  until  Cyclin  D  is 
expressed  (or  activated).  This  property  agrees  with  the  experimental  finding  that  Cyclin  D  is  fre¬ 
quently  overexpressed  in  pancreatic  tumors  (58]  •  This  indicates  that  Cyclin  D  is  potentially  good 
target  for  pancreatic  cancer  treatments.  However,  in  our  model  the  activation  of  Cyclin  D  is  not  a 
steady  state,  since  the  following  property  is  false. 

AF  AG  CyclinD 

Next,  we  study  the  role  of  P53  in  apoptosis.  It  is  known  that  P53  can  induce  apoptosis  through 
several  signaling  pathways  (1120.  Here,  we  ask  whether  in  our  model  it  is  never  the  case  that  P53 
is  not  activated  until  Apoptosis  is  activated.  This  question  can  be  encoded  in  the  following  CTL 
formula,  which  is  verified  to  be  false. 

!E  (!P53  U  Apoptosis ) 

Thus,  Apoptosis  can  be  activated  even  when  P53  is  not. 

Oscillations 

There  have  been  several  experimental  demonstrations  of  oscillations  of  NFkB  signaling  f  122. 
1 1691 .  We  therefore  ask  whether  our  in  silico  model  features  oscillations  as  well.  A  CTL  formula 
for  encoding  oscillations  in  NFkB  is  the  following, 

AG  (( \NFkB  -a  AF  NFkB)  A  (NFkB  AF  \NFkB )) 

which  turns  out  to  be  false.  Next,  we  check  whether  overexpression  of  TGFA  can  instead  induce 
NFkB’s  oscillations.  The  formula 

TGF/3  AG  (( \NFkB  -►  AF  NFkB)  A  (NFkB  AF  \NFkB)) 
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is  in  fact  true,  which  means  that  an  initial  overexpression  of  TGF/i  always  leads  to  oscillations  in 
NF/iB’s  expression  level.  A  similar  property  holds  true  for  PIP3. 

PIPS  ->•  AG  (( \NFkB  AF  NFkB)  A  (NFkB  AF  \NFkB )) 

This  property  is  actually  an  invariant  of  the  model,  since  the  following  formula  is  also  true. 

AG  (PIPS  -A  AG  (( \NFkB  ->■  AF  JVFkP)  A  (JVFkP  ->■  AF  \NFkB))) 

It  would  be  interesting  to  test  experimentally  the  properties  regarding  TGF/3  and  PIP3.  Finally, 
oscillations  have  also  been  detected  in  the  expression  level  of  P53  and  MDM2.  In  11971.  oscillations 
of  P53  lasted  more  than  72  hours  after  cell  damage  induced  by  7  radiation.  The  next  property  is 
true, 

AG  ((P53  ->•  AF  MDM2 )  A  (. MDM2  ->■  AF  !P53)) 

which  means  that  overexpression  of  P53  will  always  activate  MDM2,  which  will  in  turn  inhibit 
P53. 
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Chapter  3 

Biological  Signaling  Networks  as 
Qualitative  Networks  and  Improved 
Bounded  Model  Checking 


The  usage  of  Boolean  networks  has  been  one  successful  approach  to  the  usage  of  abstraction 
in  biology.  Boolean  networks  call  for  abstracting  the  status  of  each  modeled  substance  as  either 
active  (on)  or  inactive  (off).  Although  a  very  high  level  abstraction,  it  has  been  found  useful  to  gain 
better  understanding  of  certain  biological  systems  [188  ,jl93  |.  The  appeal  of  this  discrete  approach 
along  with  the  shortcomings  of  the  very  aggressive  abstraction,  led  researchers  to  suggest  various 
formalisms,  such  as  Qualitative  Networks  HI 901  and  Gene  Regulatory  Networks  H 1681  that  allow 
to  refine  models  when  compared  to  the  Boolean  approach.  In  these  formalisms,  every  substance 
can  have  one  of  a  small  discrete  number  of  levels.  Dependencies  between  substances  become 
algebraic  functions  instead  of  Boolean  functions.  Dynamically,  a  state  of  the  model  corresponds 
to  a  valuation  of  each  of  the  substances  and  changes  in  values  of  substances  occur  gradually  based 
on  these  algebraic  functions.  Qualitative  networks  and  similar  formalisms  (e.g.,  genetic  regulatory 
networks  H2041)  have  proven  to  be  a  suitable  formalism  to  model  some  biological  systems  [3T. 
H88lH90l[204l. 

In  this  chapter,  we  consider  model  checking  of  qualitative  networks.  One  of  the  unique  features 
of  qualitative  networks  is  that  they  have  no  initial  states.  That  is,  the  set  of  initial  states  is  the  set 
of  all  states.  Obviously,  when  searching  for  specific  executions  or  when  trying  to  prove  a  certain 
property  we  may  want  to  restrict  attention  to  certain  initial  states.  However,  the  general  lack  of 
initial  states  suggests  a  unique  approach  towards  model  checking.  It  follows  that  if  a  state  that  is 
not  visited  after  i  steps  will  not  be  visited  after  i'  steps  for  every  i'  >  i.  These  “decreasing”  sets  of 
reachable  states  allow  to  create  a  more  efficient  symbolic  representation  of  all  the  paths  of  a  certain 
length. 

However,  this  observation  alone  is  not  enough  to  create  an  efficient  model  checking  procedure. 
Indeed,  accurately  representing  the  set  of  reachable  states  at  a  certain  time  amounts  to  the  original 
problem  of  model  checking  (for  reachability),  which  does  not  scale.  In  order  to  address  this  we  use 
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an  over- approximation  of  the  set  of  states  that  are  reachable  by  exactly  n  steps.  We  represent  the 
over-approximation  as  a  Cartesian  product  of  the  set  of  values  that  are  reachable  for  each  variable 
at  every  time  point.  The  computation  of  this  over-approximation  never  requires  us  to  consider  more 
than  two  adjacent  states  of  the  system.  Thus,  it  can  be  computed  quite  efficiently.  Then,  using  this 
over-approximation  we  create  a  much  smaller  encoding  of  the  set  of  possible  paths  in  the  system. 

We  test  our  method  on  many  of  the  biological  models  developed  using  Qualitative  Networks. 
Properties  expressed  by  Linear  Temporal  Logic  (LTL)  [|178tt  formulas  are  translated  to  an  addi¬ 
tional  set  of  constraints  on  the  set  of  paths.  Our  encoding  is  based  on  temporal  testers  Ill79l.  The 
experimental  results  show  that  there  is  significant  acceleration  when  considering  the  decreasing 
reachability  property  of  qualitative  networks.  In  many  examples,  in  particular  larger  and  more 
complicated  biological  models,  this  technique  leads  to  considerable  speedups.  The  technique 
scales  well  with  increase  of  size  of  models  and  with  increase  in  length  of  paths  sought  for.  In 
particular,  for  an  existing  model  of  Leukemia,  our  approach  works  at  least  5  times  faster  than  the 
standard  approach  and  up  to  100  times  faster  in  some  cases.  These  results  are  especially  encourag¬ 
ing  given  the  methodology  biologists  have  been  using  when  employing  our  tools  ll29l.  Typically, 
models  are  constructed  and  then  compared  with  experimental  results.  The  process  of  model  de¬ 
velopment  is  a  highly  iterative  process  involving  trial  and  error  where  the  biologist  compares  a 
current  approach  with  experimental  results  and  refines  the  model  until  it  matches  current  experi¬ 
mental  knowledge.  In  this  iterative  process  it  is  important  to  give  fast  answers  to  queries  of  the 
biologist.  We  hope  that  with  the  speed  ups  afforded  by  this  new  technique,  model  checking  could 
be  incorporated  into  the  routine  methodology  of  experimental  biologists  using  our  tools. 


3.1  Qualitative  Networks  Example 


We  start  with  an  example  giving  some  introduction  to  Qualitative  Networks  and  the  usage  of  LTL 
model  checking  in  this  context. 


Figure  |3.1|  shows  a  model  representing  aspects  of  cell-fate  determination  during  C.  elegans 
vulval  development  ||89l.  The  part  shown  in  the  figure  includes  three  cells.  Each  cell  in  the  model 
represents  a  vulval  precursor  cell  and  the  elements  inside  it  represent  proteins  whose  level  of  ac¬ 
tivity  influences  the  decision  of  the  cell  as  to  which  part  of  the  vulva  the  descendants  of  the  cell 
should  form.  All  cells  execute  the  same  program  and  it  is  the  communication  between  the  cells 
themselves  as  well  as  communication  between  the  cells  and  additional  parts  of  the  model  (i.e., 
external  signals)  that  determine  a  different  fate  for  each  of  the  cells.  Understanding  cell-fate  de¬ 
termination  is  crucial  for  our  understanding  of  normal  development  processes  as  well  as  occasions 
where  these  go  wrong  such  as  disease  and  cancer.  The  pictorial  view  gives  rise  to  a  formal  model 
expressed  as  a  qualitative  network  Iil90ll.  Formal  definitions  are  in  the  next  section. 


Each  of  the  cells  in  the  model  includes  executing  components,  for  example  LET-60,  that  cor¬ 
respond  to  a  single  variable.  Each  variable  v  holds  a  value,  which  is  a  number  in  0, 1,  •  •  •  ,  Nv, 
where  Nv  is  the  granularity  of  the  variable.  Specifically,  in  Figure  1  all  variables  range  over  0,1,2. 
A  target  function,  Tv,  defined  over  the  values  of  variables  affecting  v  (i.e.,  having  incoming  arrows 
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Figure  3.1:  A  pictorial  view  of  part  of  a  model  describing  aspects  of  cell-fate  determination  during 
C.  elegan’s  vulval  development  [f89l1.  The  image  shows  two  cells  having  the  “same  program”. 
Neighboring  cells  and  connections  between  cells  are  not  shown. 


into  v), determines  how  v  is  updated:  if  v  <  Tv  and  v  <  Nv  then  v'  =  v  +  1,  if  v  >  Tv  and 
v  >  0  then  v'  —  v  —  1,  else  v  does  not  change.  In  a  qualitative  network  all  variables  are  updated 
synchronously  in  parallel. 

Intuitively,  the  update  function  of  each  variable  is  designed  such  that  the  value  of  the  variable 
follows  its  target  function,  which  depends  on  other  variables.  In  the  biological  setting,  the  typical 
target  of  a  variable,  v,  combines  the  positive  influence  of  variables  uq,  w2,  ■  -  ■  ,ws  with  the  negative 
influence  of  variables  ws+1,ws+2,  •  •  •  ,  ws+r: 


max(  0, 


1 

r 


^  ^  VJs+k 
k= 1 


+ 


1 

2 


) 


Graphically,  this  is  often  represented  as  an  influence  graph  with  — »  edges  between  each  of  v'\ ,  'U'2, 
•  •  •  ,  ws  and  v  and  H  edges  between  each  of  ws+i,  ws+ 2,  •  •  •  ,  ws+r  and  v.  More  complicated  target 
functions  can  be  defined  using  algebraic  expressions  over  uq,  •  •  •  ,  ws+r.  We  refer  the  reader  to 
[29,  69]  for  further  details  about  other  modeling  options. 

Specifically,  in  the  model  above,  the  target  of  1st  is: 


T\st  =  min( 2  —  signalact,  1)  *  lin  —  12 


This  models  activation  by  lin- 12  and  inhibition  by  signalact.  However,  inhibition  occurs  only  when 
signalact  is  at  its  maximal  level  (2).  When  inhibition  is  not  maximal  the  target  follows  the  value  of 


29 


lin-12.  The  target  of  SEM-5  is: 


Tsem- 5  =  rnax{ 0,  2  —  ((2  —  signalact)  *  (max (1st  —  1,  0)  +  1))) 

This  function  means  that  1st  inhibits  SEM-5  and  signalact  activates  it.  However,  activation  takes 
precedence:  inhibition  takes  effect  only  in  case  that  activation  is  not  at  its  maximum  value  (2),  and 
only  when  inhibition  is  at  its  maximum  value  (2).  Otherwise,  the  target  follows  the  value  of  its 
activator  (signalact). 

Models  are  analyzed  to  ensure  that  they  reproduce  behavior  that  is  observed  in  experiments.  A 
mismatch  between  the  model  and  experimental  observations  signifies  that  something  is  wrong  with 
our  understanding  of  the  system.  In  such  a  case,  further  analysis  is  required  in  order  to  understand 
whether  and  how  the  model  needs  to  be  changed.  Models  are  usually  analyzed  by  simulating 
them  and  following  the  behavior  of  components.  A  special  property  of  interest  in  these  types  of 
models  is  that  of  stability:  there  is  a  unique  state  that  has  a  self  loop  and  all  executions  lead  to 
that  state  [ 69, 11901.  When  a  model  does  stabilize  it  is  interesting  to  check  the  value  of  variables 
in  the  stabilization  point.  In  addition,  regardless  of  whether  the  model  is  stabilizing  or  not,  model 
checking  is  used  to  prove  properties  of  the  model  or  to  search  for  interesting  executions.  For  the 
model  in  Figure  [Til  the  following  properties,  e.g.,  are  of  interest. 


•  Do  there  exist  executions  leading  to  adjacent  primary  fates  in  which  increase  of  LS  hap¬ 
pens  after  down-regulation  of  lin-12?  This  property  is  translated  to  an  LTL  formula  of  the 
following  format: 

9  A  FG  fid  A  (-.di  U  k)  A  {^dj  U  k) 


where  9  is  some  condition  on  initial  states,  ft.3  is  the  property  characterizing  the  states  in 
which  VPCs  i  and  j  are  both  in  primary  fate,  di  is  the  property  that  lin-12  is  low  in  VPC  i,  lt 
is  the  property  that  di  is  high  in  VPC  i,  and  dj  and  lj  are  similar  for  VPC  j.  This  property  is 
run  in  positive  mode,  i.e.,  we  are  searching  for  execution  that  satisfies  this  property. 


•  Is  it  true  that  for  runs  starting  from  a  given  set  of  states  the  sequence  of  occurrences  leading 
to  fate  execution  follows  the  pattern:  MPK-1  increases  to  high  level  then  lin-12  is  down- 
regulated,  and  then  LS  is  activated.  This  property  is  translated  to  an  LTL  formula  of  the 
following  format: 

9  =>  F  (m,  A  XF  (k  A  XF  dj) 

where  9  is  some  condition  on  initial  states,  m,  is  the  property  characterizing  states  in  which 
VPC  i  has  a  high  level  of  MPK-1, 1,  is  the  property  characterizing  states  in  which  VPC  i  has 
a  low  level  of  lin-12,  and  di  is  the  property  characterizing  states  in  which  VPC  i  has  a  high 
level  of  LS.  This  property  is  run  in  negative  (model  checking)  mode,  i.e.,  we  are  searching 
for  executions  falsifying  this  property  and  expecting  the  search  to  fail. 
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3.2  Qualitative  Networks 

In  this  section,  we  formally  introduce  the  Qualitative  Networks  (QN)  framework  and  recall  the 
definition  of  linear  temporal  logic  (LTL). 

A  qualitative  network  (QN),  Q(V,  T,  N ),  of  granularity  N  +  1  consists  of  variables:  V  = 
( vi,V2 ,  •••  ,  vn).  (Note  that,  for  simplicity,  we  assume  that  all  variables  have  the  same  range 
{0,  •  •  •  ,  N}.  The  extension  to  individual  ranges  is  not  complicated.  Our  implementation  sup¬ 
ports  individual  ranges  for  variables.)  A  state  of  the  system  is  a  finite  map  s  :  V  — >  {0, 1,  •  •  •  N}. 
Each  variable  vi  G  V  has  a  target  function  7}  G  T  associated  with  it:  Tt  :  (0, 1,  •  •  •  ,  7V}n  — >■ 
{0, 1,  •  •  •  ,7V}.  Qualitative  networks  update  the  variables  using  synchronous  parallelism. 

Target  functions  in  qualitative  networks  direct  the  execution  of  the  network:  from  state  s  = 

(d\,d2,  •  •  •  ,  dn),  the  next  state  s'  =  (d\ ,  d'2,  •  •  •  ,  d'n)  is  computed  by: 

{di  +  1  di  <  Ti(s)  and  di  <  N, 

di  —  1  di  >  Tj(s)  and  di  >  0,  (1) 

di  otherwise 

A  target  function  of  a  variable  v  is  typically  a  simple  algebraic  function,  such  as  sum,  over  sev¬ 
eral  other  variables  w±,w2,  ■  ■  ■  ,  wm.  We  often  say  that  v  depends  on  w\ ,  w2-  ■  ■  ■  ,  wm  or  that 
wi,  w2,  ■  ■  ■  ,  wm  are  inputs  of  v.  In  the  following,  we  use  the  term  network  to  refer  to  a  quali¬ 
tative  network. 

A  QN  Q(Y,  T,  N )  defines  a  state  space  E  =  (s  :  V  — >  (0, 1,  •  •  •  ,  N}}  and  a  transition 
function  /  :  E  — >■  E,  where  f(s)  =  s'  such  that  for  every  v  G  V  we  have  s'(v)  depends  on  Tv(s) 
as  in  Equation  (1).  For  a  state  sG  E  we  denote  by  s(v)  also  by  sv.  In  particular,  fv(s)  =  f(s)(v) 
is  the  value  of  v  in  f(s).  We  say  that  a  state  s  is  recurring  if  it  is  possible  to  get  back  to  s  after  a 
finite  number  of  applications  of  /.  That  is,  if  for  some  i  >  0  we  have  /®(s)  =  s.  As  the  state  space 
of  a  qualitative  network  is  finite,  the  set  of  recurring  states  is  never  empty.  We  say  that  a  network 
is  stabilizing  if  there  exists  a  unique  recurring  state  s.  That  is,  there  is  a  unique  state  s  such  that 
f(s)  =  s,  and  for  every  other  state  s'  and  every  i  >  0  we  have  f  i(s')  ^  s'.  Intuitively,  this  means 
that  starting  from  an  arbitrary  state,  we  always  end  up  in  a  fixpoint  and  always  the  same  one.  A 
run  of  a  QN  Q(V,  T,  N )  is  an  infinite  sequence  r  =  sq,  si,  •  •  •  such  that  for  every  i  >  0  we  have 
Si  G  E  and  si+i  =  /(s*). 

We  now  define  LTL  over  runs  of  qualitative  networks  as  follows.  For  every  variable  v  G  V  and 
every  value  n  G  0, 1,  •  •  •  ,  N,  we  define  an  atomic  proposition  v  ~  n,  where  ~G  >,>,<,<•  Let 
AP  denote  the  set  of  all  atomic  propositions  (for  a  network  Q ).  The  set  of  LTL  formulas  is: 

Lp  ::=  AP  |  (p  V  </?  |  —up  \  ~K.p  \  tpXJp 

As  usual,  we  introduce  A,  —A  F,  and  G  as  syntactic  sugar. 

An  LTL  formula  (p  is  satisfied  over  a  run  r  =  so,si,  •  •  •  in  location  i,  denoted  r,i  |=  p 
according  to  the  following: 
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•  For  p  =  v  ~  n  e  AP  we  have  r,  i  |=  p  if  Si(v)  ~  n. 

•  For  p  =  -np  we  have  r,i  \=  p  if  it  is  not  the  case  that  r,  i  \=  ip. 

•  For  p  =  ipi  V  4>-2  we  have  r,i  (=  93  if  either  r,  i  |=  -01  or  r,  i  |=  ip2- 

•  For  p  =  we  have  r,  i  \=  p  if  r,  i  +  1  |=  -0. 

•  For  p  =  ^iU^2  we  have  r,  i  |=  p  if  there  is  j  >  ?'  such  that  r,  j  |=  ip2  and  for  every 
i  <  k  <  j  we  have  r,  k  |= 

We  say  that  a  run  r  satisfies  an  LTL  formula  p,  denoted  r  |=  p  if  r,  0  |=  p.  Given  a  Qualitative 
Network  Q,  we  say  that  0  satisfies  an  LTL  formula  p,  denoted  Q  |=  p,  if  for  every  run  r  of  Q  we 
have  r  |=  p.  In  case  that  QYp  a  counterexample  is  a  run  r  such  that  r  Y  p. 

We  use  bounded  model  checking  [f64Tl  for  checking  whether  a  qualitative  network  satisfies  a 
given  LTL  formula  p.  Intuitively,  we  search  for  a  run  of  a  certain  structure  (and  length)  that  does 
not  satisfy  the  formula  by  constructing  a  Boolean  formula  whose  satisfiability  corresponds  to  such 
a  run.  Searching  for  a  counterexample  of  length  l  means  that  we  (1)  create  Boolean  variables  that 
represent  the  state  of  the  system  in  l  different  time  points,  (2)  add  constraints  that  enforce  that 
the  transition  of  the  qualitative  network  holds  between  every  two  consecutive  time  points,  (3)  add 
constraints  that  enforce  that  the  transition  of  the  qualitative  network  holds  between  the  state  at  time 
l  —  1  (last  state)  and  some  previous  state  (i.e.,  that  the  sequence  of  states  ends  in  a  loop),  and  (4) 
add  Boolean  variables  and  constraints  that  enforce  satisfaction  of  the  (negation  of)  the  temporal 
property. 

In  order  to  create  a  Boolean  encoding  of  the  LTL  formula  we  use  a  variant  of  the  temporal 
testers  approach  in  1117911.  Specifically,  for  every  temporal  subformula  (and  every  time  point  in  the 
trace)  we  add  a  Boolean  variable  that  tracks  the  truth  value  of  the  subformula  at  that  time.  The 
truth  value  of  these  variables  are  connected  to  the  truth  values  of  propositions  (encoded  through 
the  state  of  the  model)  and  truth  values  of  other  subformulas.  In  addition,  we  add  constraints  that 
enforce  satisfaction  of  eventualities  in  the  loop.  In  order  to  search  for  a  trace  that  satisfies  a  certain 
LTL  formula  we  add  the  encoding  of  the  formula  to  the  trace.  Satisfiability  then  provides  a  run 
satisfying  the  formula.  To  prove  that  all  runs  up  of  a  certain  length  satisfy  a  formula,  we  add  the 
encoding  of  the  negation  of  the  formula  to  the  trace.  Unsatisfiability  then  provides  a  proof  that  no 
run  (of  the  given  length)  satisfies  the  formula. 


3.3  Decreasing  Reachability  Sets 

A  notable  difference  between  QNs  and  “normal”  transition  systems  is  that  QNs  do  not  specify  ini¬ 
tial  states.  For  example,  for  the  classical  stability  analysis  all  states  are  considered  as  initial  states. 
It  follows  that  if  a  state  s  of  a  QN  is  not  reachable  after  i  steps,  it  is  not  reachable  after  i!  steps  for 
every  %'  >  i.  Thus,  there  is  a  decreasing  sequence  of  sets  So  5  Si  3  •  •  O  Ej  such  that  search¬ 
ing  for  runs  of  the  network  can  be  restricted  to  the  set  of  runs  of  the  form  S0,  Si,  •  •  •  ,  (S;)A 
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Here  we  show  how  to  take  advantage  of  this  fact  in  constructing  a  more  scalable  model  checking 
algorithm  for  qualitative  networks. 

Consider  a  Qualitative  Network  Q(V,  T,  N )  with  set  of  states  £  :  V  — *  {0,  •  •  •  ,  N}.  We  say 
that  a  state  s  G  £  is  reachable  by  exactly  i  steps  if  there  is  some  run  r  =  s0,  si,  ■  •  •  such  that 
s  =  Si.  Dually,  we  say  that  s  is  not  reachable  by  exactly  i  steps  if  for  every  run  r  =  s0,s  i,  •  ••  we 
have  Si  ^  s. 

Lemma  3.1.  If  a  state  s  is  not  reachable  by  exactly  i  steps  then  it  is  not  reachable  by  exactly  i' 
steps  for  every  i'  >  i. 

The  algorithm  [T|  computes  a  decreasing  sequence  £0  D  £1  D  •••  D  £  j_  i  such  that  all  states 
that  are  reachable  by  exactly  i  steps  are  in  £,  if  i  <  j  and  in  £J_]  if  i  >  j.  We  note  that  the 
definition  of  £J+i  in  line  5  is  equivalent  to  the  standard  £J+1  =  /(£.;),  where  function  /(•)  is  used 
to  compute  the  next  reachable  set.  However,  we  choose  to  write  it  as  in  the  algorithm  below  in  order 
to  stress  that  only  states  in  £j  are  candidates  for  inclusion  in  £?+ 1-  Given  the  sets  £0,  •  •  •  ,  £j-i, 
every  run  r  =  s0,  si,  •  ■  •  of  Q  satisfies  st  G  £,  for  i  <  j  and  st  G  £J_1  for  i  >  j.  In  particular,  if 
Q  Y  (p  for  some  LTL  formula  if,  then  the  run  witnessing  the  unsatisfaction  of  ip  can  be  searched 
for  in  this  smaller  space  of  runs.  Unfortunately,  the  algorithm  [T] is  not  feasible.  Indeed,  it  amounts 
to  computing  the  exact  reachability  sets  of  the  QN  Q,  which  does  not  scale  well  Il69l. 


Algorithm  1  Concrete  Decreasing  Reachability 

l:  £0  =  £; 

2:  £_!  =  0; 

3:  j  =  0; 

4:  while  £j_ i  7^  £j  do 

5:  £j+1  =  £j  \  {s'  G  £|Vs  G  £  •  s'  +  /(s)}; 

6:  j  +  +; 

7:  end  while 
8:  return  £0,  •  •  •  ,  £j_i 


In  order  to  effectively  use  Lemma  1  we  combine  it  with  over-approximation,  which  leads  to 
a  scalable  algorithm.  Specifically,  instead  of  considering  the  set  £/,  of  states  reachable  at  step  k, 
we  identify  for  every  variable  Vi  G  V  the  domain  Di  k  of  the  set  of  values  possible  at  time  k  for 
variable  Wj.  Just  like  the  general  set  of  states,  when  we  consider  the  possible  values  of  variable  v, 
we  get  that  Dit0  D  D^i  D  ■  ■  ■  D  Diti.  The  advantage  is  that  the  sets  D,^.  for  all  vt  G  V  and  k  >  0 
can  be  constructed  by  induction  by  considering  only  the  knowledge  on  previous  ranges  and  the 
target  function  of  one  variable. 

Consider  the  algorithm  |2j  For  each  variable,  it  initializes  the  set  of  possible  values  at  time  0  as 
the  set  of  all  values.  Then,  based  on  the  possible  values  at  time  j,  it  computes  the  possible  values 
at  time  j  +  1.  The  actual  check  can  be  either  implemented  explicitly  if  the  number  of  inputs  of  all 
target  functions  is  small  (as  in  most  cases)  or  symbolically.  Considering  only  variables  (and  values) 
that  are  required  to  decide  the  possible  values  of  variable  Vi  at  time  j  makes  the  problem  much 
simpler  than  the  general  reachability  problem.  Notice  that,  again,  only  values  that  are  possible  at 
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time  j  need  be  considered  at  time  j  +  1.  That  is,  Aj+i  starts  as  empty  (line  6)  and  only  values 
from  Dij  are  added  to  it  (lines  7  -  10).  As  before,  Ay+i  is  the  projection  of  /(Aj  x  ■  ■  •  x  Dmj ) 
on  The  notation  used  in  the  algorithm  above  stresses  that  only  states  in  Ay  are  candidates  for 
inclusion  in  Ay+i- 


Algorithm  2  Abstract  Decreasing  Reachability 

1:  Vuj  G  V  ■  A,o  =  {0;  1,  •  •  •  ,  AT}; 

2:  VUj  G  V  ■  A,-l  =  0; 

3:  J  =  0; 

4:  while  G  17  ■  DltJ  ^  Aj-i  do 

5:  for  each  v;  G  17  do 

6:  Aj+1  =  0; 

7:  for  each  d  G  Ay  do 

8.  if  3(di,  ■  ■  ■  ,  dm)  G  D\  j  x  •  •  •  x  Dm  j  •  fv(d\,  ,  dm)  d  then 

9:  Aj+i  =  Ay+ 1  U  {d}; 

10:  end  if 

11:  end  for 

12:  end  for 

13:  end  while 

14:  j  +  +; 

15:  return  Vty  G  17  V/  <  j  ■  Ay' 


The  algorithm  produces  very  compact  information  that  enables  to  follow  with  a  search  for  runs 
of  the  QN.  Namely,  for  every  variable  Vi  and  for  every  time  point  0  <  k  <  j  we  have  a  decreasing 
sequence  of  domains 

A,o  ^  A, i  ^  ^  A,*- 

Consider  a  Qualitative  Network  Q(V,  T,  iV),  where  17  =  {rq,  •  •  •  ,  vn\  and  a  run  r  =  s0,  si,  •  •  • . 
As  before,  every  run  r  =  so,  s i,  ■  ■  ■  satisfies  that  for  every  i  and  for  every  t  we  have  st (vt )  G  D,j 
for  t  <  j  and  st(vi)  G  Aj-i  for  t  >  j. 

We  look  for  paths  that  are  in  the  form  of  a  lasso,  as  we  explain  below.  We  say  that  r  is  a 
loop  of  length  l  if  for  some  0  <  k  <  l  and  for  all  m  >  Owe  have  s/+m  =  That  is,  the 

run  r  is  obtained  by  considering  a  prefix  of  length  l  —  k  of  states  and  then  a  loop  of  k  states  that 
repeats  forever.  A  search  for  a  loop  of  length  /  that  satisfies  an  LTL  formula  cp  can  be  encoded  as 
a  bounded  model  checking  query  as  follows.  We  encode  the  existence  of  l  states  s0,  •  •  •  ,  si-±.  We 
use  the  decreasing  reachability  sets  A,t  to  force  state  st  to  be  in  A,t  x  •  ■  ■  x  Ai,t-  This  leads  to 
a  smaller  encoding  of  the  states  s0,  -  •  •  ,  s/-i  and  to  smaller  search  space.  We  add  constraints  that 
enforce  that  for  every  0  <  t  <  l  —  1  we  have  st+i  =  f(st).  Furthermore,  we  encode  the  existence 
of  a  time  l  —  k  such  that  =  /(s;_ i).  We  then  search  for  a  loop  of  length  /  that  satisfies  p  .  It 
is  well  known  that  if  there  is  a  run  of  Q  that  satisfies  p  then  there  is  some  l  and  a  loop  of  length 
l  that  satisfies  p  .  We  note  that  sometimes  there  is  a  mismatch  between  the  length  of  loop  sought 
for  and  length  of  sequence  of  sets  (j)  produced  by  the  algorithm  |2j  Suppose  that  the  algorithm 
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returns  the  sets  Dit  for  vt  e  V  and  0  <  t  <  j.  If  l  >  j,  we  use  the  sets  A,j_i  to  “pad”  the 
sequence.  Thus,  states  Sj ,  •  •  •  ,  s;_i  will  also  be  sought  in  JT  Aj- 1-  If  l  <  .h  we  use  the  sets 
A,o,  •  •  •  ,  A,z- 2,  A,j- 1  for  Vi  €  V.  Thus,  only  the  last  state  s;_i  is  ensured  to  be  in  our  “best” 
approximation  JT  Aj-i-  A  detailed  explanation  of  how  we  encode  the  decreasing  reachability 
sets  as  a  Boolean  satisfiability  problem  is  given  in  ll62l. 


3.4  Results  for  Various  Biological  Models 


We  implemented  this  technique  to  work  on  models  defined  through  our  tool  BMA  Ii29l.  Here,  we 
present  experimental  results  of  running  our  implementation  on  a  set  of  different  biological  models, 
including  a  total  of  22  benchmark  problems  from  various  sources  (skin  cells  differentiation  models 
by  ourselves,  diabetes  models  from  PH-  models  of  cell  fate  determination  during  C.  elegans  vulval 
development,  a  Drosophila  embryo  development  model  from  H188L  Leukemia  models  constructed 
by  ourselves,  and  a  few  additional  examples  constructed  by  ourselves).  The  number  of  variables 
in  the  models  and  the  maximal  range  of  variables  is  reported  in  Table  |3.1 


Model  name 

#Vars 

Range 

Model  name 

#Vars 

Range 

2var unstable 

2 

0..1 

Bcr-Abl 

57 

0..2 

B  cr- AblN  oFeedbacks 

54 

0..2 

BooleanLoop 

2 

0..1 

NoLoopFound 

5 

0..4 

SkinlD TF 0 

75 

0..4 

SkinlD_TF_l 

75 

0..4 

Skin  ID 

75 

0..4 

Skin2D_3X2_0 

90 

0..4 

Skin2D_3X2_l 

90 

0..4 

Skin2D_3X2_2 

90 

0..4 

Skin2D_5X2_TF 

198 

0..4 

Skin2D 5X2 

198 

0..4 

SmallTestCase 

3 

0..4 

SSkinlD  TF  0 

30 

0..4 

SSkinlD  TF  1 

31 

0..4 

SSkinlD 

30 

0..4 

SSkin2D 3X2 

40 

0..4 

VerySmallTest 

2 

0..4 

VPCiinl5ko 

85 

0..2 

VPC_Non_stable 

33 

0..2 

VPC  .stable 

43 

0..2 

Table  3.1:  Number  of  variables  in  models  and  their  ranges. 


Our  experiments  compare  two  encodings.  One  encoding  is  explained  in  algorithm  2,  referred 
to  as  “opt”  (for  optimized),  the  other  considers  l  states  s0,  •  •  •  ,  57  where  st{vj)  <E  {0,  •  •  •  ,  Ar}  for 
every  t  and  every  i.  That  is,  for  every  variable  vt  and  every  time  point  0  <  t  <  l  we  consider  the  set 
I),  t  =  0,  •  •  •  ,  N .  This  encoding  is  referred  to  as  “naive”.  In  both  cases  we  use  the  same  encoding 
to  a  Boolean  satisfiability  problem.  Further  details  about  the  exact  encoding  can  be  found  in  ll62l. 

We  perform  two  kinds  of  experiments.  First,  we  search  for  loops  of  length  10,  20,  ■  ■  • ,  50  on 
all  the  models  for  the  optimized  and  naive  encodings.  Second,  we  search  for  loops  that  satisfy  a 
certain  LTL  property  (either  as  a  counterexample  to  model  checking  or  as  an  example  run  satisfying 
a  given  property).  Again,  this  is  performed  for  both  the  optimized  and  the  naive  encodings.  LTL 
properties  are  considered  only  for  four  biological  models.  The  properties  were  suggested  by  our 
collaborators  as  interesting  properties  to  check  for  these  models.  For  both  experiments,  we  report 
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separately  on  the  global  time  and  the  time  spent  in  the  SAT  solver.  All  experiments  were  run  on  an 
Intel  Xeon  machine  with  CPU  X7560@2.27GHz  running  Windows  Server  2008  R2  Enterprise. 


In  Tables  3.2  and  3.3  we  include  experimental  results  for  the  search  for  loops.  We  compare 
the  global  run  time  of  the  optimized  search  vs  the  naive  search.  The  global  run  time  for  the  op¬ 
timized  search  includes  the  time  it  takes  to  compute  the  sequence  of  decreasing  reachability  sets. 
Accordingly,  in  some  of  the  models,  especially  the  smaller  ones,  the  overhead  of  computing  this 
additional  information  makes  the  optimized  computation  slower  than  the  naive  one.  For  informa¬ 
tion  we  include  also  the  net  runtime  spent  in  the  SAT  solver. 


In  Table  3.4  we  include  experimental  results  for  the  model  checking  experiment.  As  before, 
we  include  the  results  of  running  the  search  for  counterexamples  of  lengths  10,  20,  30,  40,  and  50. 
We  include  the  total  runtime  of  the  optimized  vs  the  naive  approaches  as  well  as  the  time  spent  in 
the  SAT  solver.  As  before,  the  global  runtime  for  the  optimized  search  includes  the  computation 
of  the  decreasing  reachability  sets.  The  properties  in  the  table  are  of  the  following  form.  Let  I, 
a  -  ■■  d  denote  formulas  that  are  Boolean  combinations  of  propositions. 


•  /  — >  (-ia)  U  b:  we  check  that  the  sequence  of  events  when  starting  from  the  given  initial 
states  (/)  satisfies  the  order  that  b  happens  before  a. 

•  I  A  FG  a  A  F  (b  A  XF  c):  we  check  that  the  model  gets  from  some  states  (/)  to  a  loop 
that  satisfies  the  condition  a  and  the  path  leading  to  the  loop  satisfies  that  b  happens  first  and 
then  c. 

•  I  A  FG  a  A  F  (b  A  XF  (c  A  XF  d)):  we  extend  the  previous  property  by  checking  the 
sequence  a  then  b  then  c  and  then  d. 

•  I  A  FG  a  A  (~>b)  U  c:  we  check  that  the  model  gets  from  some  states  (I)  to  a  loop  that 
satisfies  the  condition  a  and  the  path  leading  to  the  loop  satisfies  that  b  cannot  happen  before 
c. 


•  GF  a  A  GF  b:  we  check  for  the  existence  of  loops  that  exhibit  a  form  of  instability  by 
having  states  that  satisfy  both  a  and  b. 

When  considering  the  path  search,  on  many  of  the  smaller  models  the  new  technique  does  not 
offer  a  significant  advantage.  However,  on  larger  models,  and  in  particular  the  two  dimensional 
skin  model  (Skin2D  5X2  from  11190111  and  the  Leukemia  model  (Bcr  Ablj  the  new  technique  is  an 
order  of  magnitude  faster.  Furthermore,  when  increasing  the  length  of  the  path  it  scales  a  lot  better 
than  the  naive  approach.  When  model  checking  is  considered,  the  combination  of  the  decreasing 
reachability  sets  accelerates  model  checking  considerably.  While  the  naive  search  increases  con¬ 
siderably  to  the  order  of  tens  of  minutes,  the  optimized  search  remains  within  the  order  of  10s, 
which  affords  a  “real-time”  response  to  users. 
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Table  3.2:  Searching  for  loops  (10,  20,  30). 


Table  3.3:  Searching  for  loops  (40,  50). 
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Chapter  4 

Phage-based  Bacteria  Killing  as  A 
Nonlinear  Hybrid  Automaton  and 
^-complete  Decision-based  Bounded  Model 
Checking 


Due  to  the  widespread  misuse  and  overuse  of  antibiotics,  drug  resistant  bacteria  now  pose  signifi¬ 
cant  risks  to  health,  agriculture  and  the  environment.  Therefore,  we  were  interested  in  an  alterna¬ 
tive  to  conventional  antibiotics,  a  phage  therapy.  Phages,  or  bacteriophages,  are  viruses  that  infect 
bacteria  and  have  evolved  to  manipulate  the  bacterial  cells  and  genome,  making  resistance  to  bac¬ 
teriophages  difficult  to  achieve.  However,  many  phages  are  temperate,  meaning  that  they  can  enter 
a  lysogenic  phase  and  therefore  not  lyse  and  kill  the  host  bacteria.  The  addition  of  a  phototoxic 
protein  -  KillerRed  111761  -  to  the  system  offers  a  second  method  of  killing  those  bacteria  targeted 
by  a  lysogenic  phage.  In  this  chapter,  we  constructed  a  hybrid  model  of  a  bacteria  killing  procedure 
that  mimics  the  stages  through  which  bacteria  change  when  phage  therapy  is  adopted.  Our  model 
was  designed  according  to  an  experimental  procedure  to  engineer  a  temperate  phage,  Lambda  (A), 
and  then  kill  bacteria  via  light-activated  production  of  superoxide.  We  applied  ((-complete  decision 
based  bounded  model  checking  li95l  to  our  model  and  the  results  show  that  such  an  approach  can 
speed  up  evaluation  of  the  system,  which  would  be  impractical  or  possibly  not  even  feasible  to 
study  in  a  wet  lab. 


4.1  The  KillerRed  Model 

The  discovery  of  antibiotics  has  been  quickly  followed  by  the  development  of  antibiotic  resistance. 
New  medicines  are  becoming  increasingly  scarce  in  tackling  this  issue.  The  document  released  by 
CDC  (Centers  for  Disease  Control  and  Prevention),  “Antibiotic  Resistance  Threats  in  the  United 
States,  2013”  flU,  intends  to  raise  public  awareness  of  the  problems  associated  with  overuse  and 
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misuse  of  antibiotics  and  to  outline  the  threats  to  society  caused  by  these  organisms.  The  organisms 
have  been  categorized  by  hazard  level  as  urgent,  serious  and  concerning.  Over  2  million  illnesses 
and  23,  000  deaths  per  year  are  a  direct  result  of  antibiotic  resistance. 

There  are  multiple  mechanisms  of  antibiotic  resistance.  First,  altered  permeability  of  the  an¬ 
timicrobial  agent  is  suggested  to  be  due  to  the  inability  of  the  agent  to  enter  the  bacterial  cell,  or 
alternatively,  due  to  the  active  export  of  the  agent  from  the  cell.  Second,  resistance  is  often  the 
result  of  the  production  of  an  enzyme  that  is  capable  of  inactivating  the  antimicrobial  agent.  Next, 
resistance  can  arise  due  to  alteration  of  the  target  site  for  the  antimicrobial  agent.  Finally,  resis¬ 
tance  can  result  from  the  acquisition  of  a  new  enzyme  to  replace  the  sensitive  one,  thus  replacing 
the  pathway  that  was  originally  sensitive  to  antibiotic  to  another  pathway. 

The  CDC  outlines  four  core  actions  that  will  help  fight  deadly  infections  [J4|:  (a)  preventing 
infections  and  the  spread  of  resistance;  (b)  tracking  resistant  bacteria;  (c)  improving  the  use  of 
today’s  antibiotics;  and  (d)  promoting  the  development  of  new  antibiotics  and  developing  new 
diagnostic  tests  for  resistant  bacteria.  Recently,  we  have  addressed  this  problem  by  designing  a 
new  system  that  relies  on  phage-based  therapy.  Phages,  or  bacteriophages,  are  viruses  that  infect 
bacteria  and  have  evolved  to  manipulate  the  bacterial  cells  and  genome,  making  resistance  to  bac¬ 
teriophages  difficult  to  achieve.  Bacteriophages  are  complex  and  utilize  many  host  pathways  such 
that  they  cannot  be  inactivated  or  bypassed.  Bacteriophages  infect  only  specific  hosts  and  can  kill 
the  host  by  cytolysis.  However,  many  phages  are  temperate,  meaning  that  they  can  enter  a  lyso¬ 
genic  phase  and  therefore  not  lyse  and  kill  the  host  bacteria.  The  addition  of  a  phototoxic  protein 
to  the  system  offers  a  second  method  of  killing  those  bacteria  targeted  by  a  lysogenic  phage.  Thus, 
our  system,  shown  in  Figure  |4.1[  explores  the  possibility  that  temperate  phages  can  also  be  used 
for  phage  therapy  and  bacteria  killing  applications.  We  incorporated  several  proteins  (KillerRed 
EM  SuperNova  112021).  that  have  been  shown  to  be  phototoxic  and  that  provide  another  level  of 
controlled  bacteria  killing. 

We  have  modeled  synthesis  and  action  of  KillerRed  that  occurs  over  three  main  phases  of  a 
typical  photobleaching  experiment:  induction  at  37°C,  storage  at  4°C  to  allow  for  protein  matura¬ 
tion,  and  photobleaching  at  room  temperature.  Within  these  phases,  we  identify  several  stages  of 
interest  in  KillerRed  synthesis  and  activity  as  follows. 

-  mRNA  synthesis  and  degradation 

-  KillerRed  synthesis,  maturation,  and  degradation 

-  KillerRed  states:  singlet  (S'),  singlet  excited  (S'*),  triplet  excited  (T*),  and  deactivated  (Da) 

-  Superoxide  production  (by  KillerRed) 

-  Superoxide  elimination  (by  superoxide  dismutase) 


We  implemented  these  system  stages  with  distinct  model  states,  and  outlined  them  in  Figure 
|4.3[  together  with  state  variables  (values  are  included  if  variables  are  fixed  within  a  state),  transi¬ 
tions  between  states,  and  events  that  trigger  state  transitions.  In  Table  |4.l|wc  list  the  model  states 
that  are  used  to  describe  the  stages  of  the  system.  In  the  following,  we  detail  our  implementation 
of  system  stages  within  the  model.  We  also  list  equations  that  we  derived  for  each  stage. 


Cell  exposure  to  light 

In  H206I.  the  authors  describe  a  method  for  determining  the  rate  coefficient  of  activation  from 
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Expression  of 
KillerRed 


Light-activated 
production  of 
superoxide 


Host  death 


Figure  4.1:  Interactions  between  phage  and  bacteria  used  in  our  model 


Figure  4.2:  Energy  diagram  for  a  generic  fluorochrome  111 981 
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the  ground  state  ka\  ka  =  al.  In  detail,  a  is  the  optical  cross-section  per  molecule  and  /  is 
the  excitation  intensity  in  photons  per  unit  area.  The  lamp  used  for  photobleaching  gave  /  = 
1  x  1027  photons / cm2 s  (about  1W).  cr  is  given  by  a  =  e (1000cm3 / L){lnVS)NA,  where  e  is 
the  extinction  coefficient  and  NA  is  Avogadro’s  number.  We  calculated  ka  =  1.72  x  1011s~  1 
for  KillerRed  for  our  photobleaching  experiments.  The  rate  constant  for  returning  to  the  ground 
state  is  kf  —  ln2/r,  where  r  is  the  half-life  for  KillerRed  in  the  excited  state,  r  for  KillerRed  is 
assumed  to  be  similar  to  r  for  dsRed  (about  3.0ns  ||38ll),  since  their  chromophores  are  identical. 
Thus,  by  assuming  that  KillerRed  is  always  in  the  excited  state  (if  it  has  not  been  deactivated) 
during  photobleaching,  we  have  that  kf  =  2.3  x  lOSs^1  and  F  =  ka/(ka  +  kf)  =  0.9987. 

Production  of  superoxide 

Production  of  the  superoxide  radical  is  governed  by  several  reactions.  Fluorescein  is  used  as  a 
model  chromophore.  S,  S*,  T*,  and  Da  are  the  singlet,  excited  singlet,  excited  triplet,  and  deac¬ 
tivated  states,  respectively,  of  the  chromophore.  Figure  |4.2|  outlines  transitions  between  different 
forms  of  the  chromophore.  In  detail,  fluorochrome  molecules  absorb  photon  energy  at  a  rate  k„ 
and  go  from  the  ground  singlet  state  S  up  to  the  excited  singlet  state  S*.  Then  they  may  return 
to  the  ground  state  by  radiative  (fluorescence)  or  non-radiative  (internal  conversion)  pathway  at  a 
combined  rate  kd.  They  may  also  undergo  non-radiative  intersystem  crossing,  at  a  rate  klsc,  to  T* , 
where  they  may  return  to  the  ground  state  at  a  rate  k\.  Photobleaching  may  take  place  from  both  S* 
and  T*  at  rates  kbs  and  kbt,  respectively.  Those  photobleached  molecules  can  no  longer  participate 
in  the  excitation-emission  cycle. 

Superoxide  dismutase 

Superoxide  dismutase  is  E.  coli's  main  defense  against  superoxide.  Its  action  was  incorporated 
using  Michaelis-Menten  kinetics: 

d[P2~]  _  Vmax  [P2~] 

dt  ~  Km+[°2~]  ’ 

where  Vrnax  estimated  using  kcat  from  H 13811,  and  Krn  was  estimated  using  krn  and  kcat/km  from 

m\m. 

Cell  without  A-phage  genome 

The  first  system  stage  that  we  model  is  a  bacteria  cell  that  does  not  have  phage  genome  injected, 
and  gene  transcription  is  not  induced.  Thus,  all  of  the  model  elements  are  at  their  initial  level, 
assumed  to  be  0.  In  the  model,  we  assume  that  A-phage  genome  is  injected  into  bacteria  cell  with 
rate  k\,  or  t\  time  units  after  the  start  of  time  counting.  When  analyzing  individual  cells  this  does 
not  have  an  effect,  but  is  important  to  take  into  account  when  analyzing  cell  population. 

Cell  with  injected  A-phage  genome 

After  the  injection  of  phage  genome  into  the  cell,  the  genome  will  be  inserted  into  the  bacterial 
DNA  with  rate  k2.  Or,  in  terms  of  counting  time  units,  it  will  take  t2  time  units  to  integrate  the 
phage  genome  into  bacterial  plasmid  once  it  is  inside  the  cell.  However,  since  IPTG  is  still  not 
added  to  the  cell,  we  assume  that  gene  transcription  is  not  induced  yet.  Therefore,  similar  to 
previous  two  states,  initial  state  and  the  state  of  phage  genome  injected,  this  state  is  assumed  to  be 
static. 
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Figure  4.3:  Hybrid  automaton  for  our  KillerRed  model 


Addition  ofIPTG 

When  IPTG  is  added  to  the  system,  the  transcription  starts.  Measure  of  transcriptional  efficiency  is 
the  rate  of  mRNA  synthesis,  kRNAsyn.  Our  construct  uses  a  wild-type  lac  promoter,  so  we  assume 
that  its  transcription  rates  are  similar  to  the  lac  operon.  Next,  mRNA  transforms  into  immature 
KillerRed  molecules  with  translational  efficiency,  kKRsyn.  The  maximum  translation  rate  in  the 
model  is  three  orders  of  magnitude  lower  to  reflect  the  presents  of  several  rare  codons.  This 
adjustment  is  suggested  by  comparisons  of  our  fluorescence  data  for  KillerRed  and  mRFP,  which 
have  nearly  identical  brightness. 

Immature  synthesized  KillerRed  ( KRim )  requires  additional  time  to  become  mature  KillerRed 
form  ( KRm ),  and  to  fold  and  create  a  dimer  (KRmd).  These  two  events  together  occur  with 
the  overall  rate  of  kKRrn.  This  folded  and  dimerized  form  of  KillerRed  that  can  be  activated  by 
light  is  called  singlet  form  ( KRmds )•  Degradations  of  synthesized  mRNA  and  KillerRed  in  both 
modeled  forms  are  included  in  equations  with  rates  kmRNAdeg  (characteristic  half-life),  kKRimdeg, 
and  kmdsdeg,  respectively.  In  this  model  state,  resulting  from  addition  of  IPTG  and  ending  with 
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either  removal  of  IPTG  or  addition  of  light,  we  use  the  following  ordinary  differential  equations 
(ODEs)  to  describe  the  continuous  dynamics. 


d[mRNA] 

dt 

d  [KRim\ 
dt 


d[A  RmdS_ 
dt 


k RNAsyn  '  [-DiVA]  I'RN  Aderj  '  [mAiVA] 

kKRimsyn  '  [mRN A]  —  ( kxRm  +  ^ KRimdeg ) 

‘  [K Rim] 

KRm  '  |  A  Rim\  k KRmdSdeg  '  \R  Rm.ds\ 


State 

State  description 

Input 

Next 

state(s) 

So 

Initial  system  state,  bacteria  cell,  without  phage 

n/a 

Si  (ex.) 

Si 

Phage  genome  injected 

A-phage  genome 

52  (in.), 

53  (in.) 

s2 

Phage  genome  replication  (lytic  cycle) 

Genome  replication 

n/a 

s3 

Phage  genome  within  bacterial  DNA  (lysogenic 
cycle) 

Genome  insertion 

S4  (ex.) 

s4 

Gene  transcription,  translation 

Addition  of  IPTG 

55  (ex.), 

56  (ex.) 

s5 

Gene  transcription  decrease 

Removal  of  IPTG 

S3  (in.) 

s6 

Activation  of  KillerRed 

Light  turned  ON 

57  (ex.), 

58  (ex.), 
Sn  (in.) 

s7 

Mixture  of  KillerRed  forms,  no  activation 

Light  turned  OLL 

S9  (ex.), 
Sn  (in.) 

S8 

Mixture  of  KillerRed  forms,  transcription  decrease 

Removal  of  IPTG 

Sio  (ex.), 
Sn  (in.) 

S9 

Mixture  of  KillerRed  forms,  no  activation, 
transcription  decrease 

Removal  of  IPTG 

Sn  (in.) 

Sio 

Mixture  of  KillerRed  forms,  transcription 
decrease,  no  activation 

Light  turned  OLL 

Sn  (in.) 

Sn 

Cell  death 

S  OX  >  threshold 

n/a 

Table  4.1:  List  of  modeled  system  states,  their  description,  inputs  and  next  state(s)  with  indication  whether  transition  was  triggered  by  external  input 
(ex.)  or  by  internal  variable  (in.)  reaching  some  specified  value. 


Addition  of  light 

Addition  of  light  results  in  moving  from  the  state  with  KillerRed  synthesis  into  the  state  of  activat¬ 
ing  KillerRed,  S.  In  the  state  that  assumes  system’s  exposure  to  light,  other  forms  of  KillerRed  are 
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present,  including  excited  singlet  state  S*,  ( KRmdS *)  and  triplet  state  T* ,  (KRmdT *).  Transitions 
between  different  forms  of  KillerRed  can  occur  and  therefore,  in  this  state,  we  include  the  above 
model  equations  and  modified  equation,  and  add  equations  for  other  forms  of  KillerRed  {KRmds*, 
KRmdT*),  as  well  as  equations  for  produced  superoxide  (SOX)  and  for  the  effect  of  superoxide 
dismutase  (, SOXsod ). 


d  [K  RmdS . 
d  t 


kxRm  4  [K R-irn]  +  kxRf  ■  [K RmdS*  ] 

+  kKR.ic  4  [KRmdS*]  +  ^KRnrd  4  [KRmdT*] 
+kKRsoxdl  4  [KRmdT*]  ~  kKRex  4  [K R.mds\ 
^KRmdSdeg  4  [KRmds] 


d[A  RmdS* . 
(it 


d  [KRmdT*, 

d  t 


d  [SOX] 
d  t 


d  [SOXsod] 
(it 


^KRex  4  [KRmds]  —  kxRf  4  [KRmdS*] 

—  kxRic  4  [KRmdS*]  —  k-K Rlsc  ■  [KRmdS*] 
^KRmds*  deg  4  [KRmdS*  ] 

kKRisc  4  [KRmdS*]  ~  kKRnrd  4  [KRmdT*] 

-kKRsoxdl  4  [KRmdT*] 

-kKRsoxd 2  •  [KRmdT*] 

kKRmdT* deg  4  [K RmdT*] 

kKRsoxdl  '  [KRmdT*]  T  kKRSOxd2 
d  [SOXsod] 


■  [K  R. 


mdT*  I 


d  t 


ksOD  4  VmaxSOD  4 


[SOX] 


Km  +  [SOX] 


Rates  of  KillerRed  transitioning  from  state  S*  to  state  S,  through  fluorescence  or  internal  con¬ 
version  are  denoted  with  kxRf  and  kKRic,  respectively.  Rates  of  KillerRed  transitioning  from  state 
T*  to  state  S,  through  non-radiative  deactivation  or  by  production  of  SOX  with  deactivation  are 
denoted  with  kKRnrd  and  kKRsoxdl,  respectively.  The  excited  form  of  KillerRed,  S*,  is  formed 
at  rate  kKRex,  and  is  reduced  in  several  ways:  (a)  by  fluorescence  with  rate  kKRf,  (b)  by  inter¬ 
nal  conversion  with  rate  kKRic,  (c)  by  inter-system  crossing  kKRisc,  and  (d)  by  degradation  with 
rate  kKRmdS*deg ■  The  triplet  form,  T*,  is  formed  through  intersystem  crossing  with  rate  kKRisc, 
and  is  reduced  in  several  ways,  by  non-radiative  deactivation  with  rate  kKRnrd,  by  superoxide 
(ROS)  production  with  deactivation  to  state  S  with  rate  kKRsoxdl,  by  superoxide  (ROS)  produc¬ 
tion  with  photobleaching  with  rate  kKRsoxd^ ,  and  by  degradation  with  rate  kKRmdSifdeg.  In  addition, 
ki<RSoxd1  anh  kKRSOXd2  can  he  computed  taking  into  account  relative  propensity  for  KillerRed  to 
generate  superoxide  without  becoming  deactivated  (c),  photo-bleaching  rate  obtained  from  exper¬ 
iments  ( kKRpb ),  and  quantum  yield  (<I>)  as  follows. 

u  -  n  kKRpb  k  _  kKRPb 

Rl^Rsoxd1  c  $  RRRsoxd2  $ 
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4.2  ^-Decisions  for  Hybrid  Models 


To  validate  the  correctness,  estimate  parameters,  and  conduct  sensitivity  analysis  of  our  model,  we 
constructed  a  hybrid  model  for  the  system,  and  used  delta-complete  decision  procedures  [94,  95] 
to  find  solutions  to  these  formulae.  Before  going  over  the  delta-complete  decision  procedures,  we 
first  give  the  formal  definition  of  hybrid  automata. 

Definition  4.2.1  (Hybrid  Automaton)  A  hybrid  automaton  H  consists  of  the  following  compo¬ 
nents. 

•  Variables.  A  finite  set  X  =  {.i’i ,  •  •  •  .  xn }  of  real-numbered  variables,  where  n  is  the  di¬ 
mension  of  H.  We  write  X  for  the  set  { x  \ ,  •  •  •  ,  xn}  to  represent  first  derivatives  of  variables 
during  the  continuous  change,  and  write  X'  for  the  set  { x\ ,  •  •  •  ,  x'n }  to  denote  values  of 
variables  at  the  conclusion  of  the  discrete  change. 

•  Control  graph.  A  finite  directed  multigraph  ( V,  E ).  The  vertices  in  V  are  called  control 
modes,  and  edges  in  E  are  control  switches. 

•  Initial,  invariant,  and  flow  conditions.  As  vertex  labeling  functions  over  each  control 
mode  v  G  V,  the  initial  condition  vmt(v)  is  predicate  whose  free  variables  are  from  V, 
the  invariant  condition  inv(v )  is  a  predicate  whose  free  variables  are  from  X,  and  the  flow 
condition  flow(v )  is  a  predicate  whose  free  variables  are  from  X  U  X. 

•  Jump  conditions.  An  edge  labeling  function  jump  that  assigns  to  each  control  switch  e  <E  E 
a  predicate  whose  free  variables  are  from  X  U  X'. 

•  Events.  A  finite  set  E  of  events,  and  an  edge  labeling  function  event  :  E  — y  E  that  assigns 
to  each  control  switch  an  event. 

Formal  verification  of  hybrid  systems  is  crucially  very  important  and  challenging.  Systems 
combining  nonlinear  dynamics  and  nontrivial  discrete  control  can  hardly  be  handled.  In  order 
to  overcome  the  undecidability  of  reasoning  about  hybrid  systems,  Gao  et  al.  recently  defined 
the  concept  of  ^-satisfiability  over  the  reals,  and  presented  a  corresponding  ^-complete  decision 
procedure  [94,  95].  The  main  idea  is  to  decide  correctly  whether  slightly  relaxed  sentences  over 
the  reals  are  satisfiable  or  not.  The  following  definitions  are  from  [|95l. 

Definition  4.2.2  (Bounded  Quantifier)  A  bounded  quantifier  is  one  of  the  following: 

g[a,6]x  _  gx  .  ("a  <  x  /\  x  <  ^ 

y[a,b\x  =  \/x  :  (a  <  x  A  x  <  b) 

Definition  4.2.3  (Bounded  E,  Sentence)  A  bounded  E  |  sentence  is  an  expression  of  the  form: 

3Jlxi, ...,  3hxn  :  ip(x\ ,  ...,xn) 


48 


where  Ii  =  \a.t,  b,}  are  inten’als,  w(x\, ....  xn )  is  a  Boolean  combination  of  atomic  formulas  of  the 
form  g(x±, ....  xn )  op  0,  where  g  is  a  composition  of  Type  2-computable  functions  and  op  G  {<,  < 

,  >,  >,  =,  }• 


Note  that  any  bounded  Si  sentence  is  equivalent  to  a  Ej  sentence  in  which  all  the  atoms  are 
of  the  form  /(aq, ....  xn)  =  0  (i.e...  the  only  op  needed  is  ‘=’).  Essentially,  Type  2-computable 
functions  can  be  approximated  arbitrarily  well  by  finite  computations  of  a  special  kind  of  Turing 
machines  (Type  2  machines);  most  ‘useful’  functions  over  the  reals  are  Type  2-computable.  The 
notion  of  5-weakening  of  a  bounded  sentence  is  central  to  5-satisfiability. 


Definition  4.2.4  ( 5-Weakening )  Let  5  €  Q+  U  {0}  be  a  constant  and  f  a  bounded  £i -sentence  in 
the  standard  form 

m  ki 

f  =  3hxi, ...,  3Inxn  :  /\(\J  fij(x  1, ... ,xn )  =  0)  (4.1) 

i=l  j= 1 

where  fij(x  i, ...,  xn)  =  0  are  atomic  formulas.  The  5-weakening  off  is  the  formula: 

m  ki 

fs  =  3hxu...,3Inxn  :  f\(\J  \  fij(xi,...,xn) |  <  5)  (4.2) 

i=  1  j= 1 


Note  that  f  implies 


b5,  while  the  converse  is  obviously  not  true.  The  bounded  5-satisfiability 
problem  asks  for  the  following:  given  a  sentence  of  the  form  (4.1 )  and  5  G  Q+,  correctly  decide 
whether  unsat  (0  is  false),  or  5-sat  (cf  is  true).  If  the  two  cases  overlap  either  decision  can  be 
returned:  such  a  scenario  reveals  that  the  formula  is  fragile  -  a  small  perturbation  (i.e.,  a  small  5) 
can  change  the  formula’s  truth  value. 


A  qualitative  property  of  hybrid  systems  that  can  be  checked  is  bounded  5-reachability.  It  asks 
whether  the  system  reaches  the  unsafe  region  after  k  G  N  discrete  transitions. 


Definition  4.2.5  (Bounded  k- Step  5-Reachability)  Bounded  k  step  5 -reachability  in  hybrid  sys¬ 
tems  can  be  encoded  as  a  bounded  T^i-sentence 


3x 


o 

0,9o’ 


^0  ,<J0’ 


qr°  q yt  qr0  q  t 

•’  •v0 ,qm,i  •••)  k.qmi  ~^k,qm  ■ 

( V  ( initq(xo,q )  Afi°wq(x S, 9^0,9))) 
96  Q 


k- 1 

A(A(  V  (jUmPq^q'(Xtq,X°i+  W) 

i= 0  q.q'CQ 

/\(flowq,(Xi+l  q,,xti+l  q,)))  A  ( V  unsafe q(x{q)))) 

q£Q 


(4.3) 


where  x[-  q  and  represent  the  continuous  state  in  the  mode  q  at  the  depth  i,  and  q'  is  a  successor 
mode. 
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Intuitively,  the  formula  above  can  be  understood  as  follows:  the  first  conjunction  is  asking  for 
a  set  of  continuous  variables  which  satisfy  the  initial  condition  in  one  of  the  modes  and  the  flow  in 
that  mode;  the  second  conjunction  is  looking  for  a  set  of  vectors  which  satisfy  any  k  discrete  jumps 
and  flows  in  each  successor  mode  defined  by  the  jumps;  the  third  conjunction  is  verifying  whether 
the  state  of  the  system  (the  mode  and  the  set  of  continuous  variables  in  the  mode  after  k  jumps) 
belongs  to  the  unsafe  region.  Note  that  the  previous  definition  asks  for  reachability  in  exactly  k 


steps.  One  can  build  a  disjunction  of  formula  (4.3)  for  all  values  from  1  to  k,  thereby  obtaining 
reachability  within  k  steps. 

The  5-reachability  problem  can  be  solved  using  the  described  5-complete  decision  procedure, 
which  will  correctly  return  one  of  the  following  answers: 

-  unsat:  the  system  never  reaches  the  bad  region  U, 


-  5-sat:  the  5-perturbation  of  (|4.3[)  is  true,  and  a  witness,  i.e.,  an  assignment  for  all  the  variables,  is 
returned. 


We  now  show  that  this  5-decisions  technique  for  hybrid  models  can  be  used  to  handle  problems 
such  as  model  falsification,  parameter  estimation,  and  parametric  sensitivity  analysis. 

Model  Falsification.  The  model  falsification  problem  with  existing  experimental  observations 
is  basically  a  bounded  reachability  question:  Expressing  each  experimental  observation  as  a  goal 
region,  is  there  any  number  of  steps  k  in  which  the  model  reaches  the  goal  region?  If  none  exists, 
the  model  is  incorrect  regarding  the  given  observation.  If,  for  each  observation,  a  witness  is  re¬ 
turned,  we  can  conclude  that  the  model  is  correct  with  regard  to  a  given  set  of  experimental  results. 
This  is  a  bounded  Model  Checking  problem,  where  all  experimental  observations  can  be  expressed 
as  reachability  properties. 

Parameter  Estimation.  The  parameter  estimation  problem  can  also  be  encoded  as  a  k- step 
reachability  problem:  Does  it  exist  a  parameter  combination  for  which  the  model  reaches  the  given 
goal  region  in  k  steps?  Considering  an  assignment  of  a  certain  set  of  system  parameters,  if  a 
witness  is  returned,  this  assignment  is  potentially  a  good  estimation  for  those  parameters.  The  goal 
here  is  to  find  an  assignment  with  which  all  the  given  goal  regions  can  be  reached  in  bounded  steps. 

Parametric  Sensitivity  Analysis.  The  sensitivity  analysis  can  be  conducted  by  a  set  of  bounded 
reachability  queries  as  well.  For  different  possible  values  of  a  certain  system  parameter,  are  the 
results  of  reachability  analysis  the  same?  If  so,  the  model  is  insensitive  to  this  parameter  with 
regard  to  the  given  experimental  observations. 


4.3  Results  and  Discussion 

Effect  of  delay  in  turning  light  ON 

First,  we  have  studied  the  relation  between  the  time  to  turn  ON  the  light  after  adding  IPTG 
that  is  a  molecular  biology  reagent  used  to  induce  protein  expression  ( tughtoN X  and  the  total  time 
needed  until  the  bacteria  cells  being  killed  ( ttotai )•  We  fixed  the  values  of  several  other  parameters 
as  follows. 


50 


-  SOXthres  =  5e-4m  -  threshold  for  the  concentration  level  of  SOX  which  is  sufficient  to  kill  the 
bacteria  cells 

-  tiightOFF,  =  2  hours  (hrs)  -  time  to  turn  the  light  OFF  after  turning  it  ON 

-  tiightoFF2  =  2  hrs  -  time  to  turn  the  light  OFF  after  removing  IPTG 

-  ti  =  1  hr  -  time  to  inject  genome 

-  f2  =  1  hr  -  time  to  insert  genome  into  DNA  after  injecting  it  into  bacteria  cell 

-  taddiPTG3  =  1  hr  -  time  to  add  IPTG  after  inserting  phage  genome  into  bacteria  DNA 

As  shown  in  the  first  two  rows  of  Table  |4.2|  the  earlier  we  turn  on  the  light  after  adding  IPTG,  the 
quicker  the  bacteria  cells  will  be  killed. 

Lower  bound  for  the  duration  of  exposure  to  light 

The  5-decisions  technique  has  also  been  adopted  to  analyze  the  impact  of  the  time  duration 
that  the  cells  are  exposed  to  light  ( tughtOFFl )  on  the  system,  and  estimate  an  appropriate  range  for 
UightoFF1  which  leads  to  the  successful  killing  of  bacteria  cells  by  KillerRed.  By  setting  SOXthres, 
tughtoFF2,  h,t2,  and  taddIPTGs  with  the  same  values  in  Section  [43]  and  assigning  2  hr  to  tHghtoN 
(time  to  turn  the  light  OFF  after  turning  it  ON),  we  have  found  that,  in  order  to  kill  bacteria  cells, 
the  system  has  to  keep  the  light  ON  for  at  least  4  hours  (see  row  3-4  of  Table |4.2]).In  addition,  we 
have  also  found  that  the  bacteria  cells  can  be  killed  within  100  hours  when  light  is  ON  for  4  hours. 


t lightON  (hr) 

i 

2 

3 

4 

5 

6 

7 

8 

9 

10 

t total  (hr) 

16 

17.2 

18.5 

20 

21.3 

22.7 

23.5 

24.1 

25 

30 

tiightOFF,  (fif) 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

killed  bacteria  cells 

failed 

failed 

failed 

succ 

succ 

succ 

succ 

succ 

succ 

succ 

trmIPTG3  (hr) 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

killed  bacteria  cells 

succ 

succ 

succ 

succ 

succ 

succ 

succ 

succ 

succ 

succ 

SOXthres  (M) 

le-4 

2e-4 

3e-4 

4e-4 

5e-4 

6e-4 

7e-4 

8e-4 

9e-4 

le-3 

t total  (hr) 

5.1 

5.2 

5.4 

17 

19 

48 

61 

71 

36 

42 

Table  4.2:  Formal  analysis  results  for  our  KillerRed  hybrid  model 


Time  to  remove  IPTG  as  an  insensitive  role 

The  sensitivity  of  the  time  difference  between  removing  the  fight  and  removing  IPTG  {trmiPTG3) 
with  regard  to  the  successful  killing  of  bacteria  cells  has  also  been  studied.  We  have  noticed  that 
trmiPTGi  has  insignificant  impacts  on  the  cell  killing  outcome  (see  row  5-6  of  Table  4.2).  This 
is  in  accordance  with  our  understanding  of  this  system,  since  any  additional  KillerRed  that  will 
be  synthesized  will  not  be  activated  in  the  absence  of  light.  Note  that,  for  other  involved  system 
parameters,  we  used  the  same  values  for  SOXthres,  tughtON,  tught0FF2,  G,  t2,  and  taddIPTG3  as  in 
Section [43|  and  set  tughtoFF \  as  4  hours. 
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Necessary  level  of  superoxide 

Finally,  we  have  used  the  5-decisions  to  discuss  the  correctness  of  our  hybrid  model  by  con¬ 
sidering  various  values  of  SOXthres  within  the  suggested  range  -  [lOOuM,  ImM].  We  have  used 
the  same  values  for  variables  SOXthres ,  UightON,  UightoFFx ,  tughtOFF2,  £1,  £2,  and  taddiPTG3  as  in 
Section |4~3]  As  we  can  see  from  row  7-8  of  Table |4~2|  the  bacteria  cells  can  be  killed  in  reasonable 
time  for  all  10  point  values  of  SOXthres,  which  was  uniformly  chosen  from  [lOOuM,  ImM].  Fur¬ 
thermore,  we  have  also  found  a  broader  range  for  SOXthres  up  to  0.6667M,  with  which  bacteria 
cells  can  be  killed  by  KillerRed. 
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Chapter  5 

Biological  Systems  as  Stochastic  Hybrid 
Models  and  SReach 


As  mentioned  in  the  introduction  chapter,  stochastic  hybrid  systems  (SHSs)  are  formal  models 
that  tightly  combine  discrete,  continuous,  and  stochastic  components.  One  important  question  for 
the  quantitative  analysis  of  SHSs  is  the  probabilistic  reachability  problem,  considering  that  many 
verification  problems  can  be  reduced  to  reachability  problems.  It  is  to  compute  the  probability  of 
reaching  a  certain  set  of  states.  This  problem  is  no  longer  a  decision  problem,  as  it  generalizes  that 
by  asking  what  is  the  probability  that  the  system  reaches  the  target  region.  For  SHSs  with  both 
stochastic  and  non-deterministic  behavior,  the  problem  results  in  general  in  a  range  of  probabilities, 
thereby  becoming  an  optimization  problem. 

In  this  chapter,  we  describe  our  tool  SReach  which  supports  probabilistic  bounded  5-reachability 
analysis  for  two  model  classes:  hybrid  automata  (HAs)  II 1 1711  with  parametric  uncertainty,  and 
probabilistic  hybrid  automata  (PHAs)  H200I  with  additional  randomness.  (Note  that,  in  the  follow¬ 
ing,  we  use  notations  -  HA(J  and  PHA,  -  for  these  two  model  classes  respectively.)  Our  method 
combines  the  recently  proposed  5-complete  bounded  reachability  analysis  technique  1114711  with 
statistical  testing  techniques.  SReach  saves  the  virtues  of  the  Satisfiability  Modulo  Theories  (SMT) 
based  Bounded  Model  Checking  (BMC)  for  HAs  [170112051.  namely  the  fully  symbolic  treatment 
of  hybrid  state  spaces,  while  advancing  the  reasoning  power  to  probabilistic  models.  Furthermore, 
by  utilizing  the  5-complete  analysis  method,  the  full  non-determinism  of  models  will  be  consid¬ 
ered.  The  coverage  of  simulation  will  be  increased,  as  the  5-complete  analysis  method  results  in  an 
over-approximation  of  the  reachable  set,  whereas  simulation  is  only  an  under- approximation  of  it. 
The  zero-crossing  problem  can  be  avoided  as,  if  a  zero-crossing  point  exists,  it  will  always  return 
an  interval  containing  it.  By  using  statistical  tests,  SReach  can  place  controllable  error  bounds 
on  the  estimated  probabilities.  We  discuss  three  biological  models  -  an  atrial  fibrillation  model, 
a  prostate  cancer  treatment  model,  and  our  synthesized  Kill  erred  biological  model  -  to  show  that 
SReach  can  answer  questions  including  model  validation/falsification,  parameter  synthesis,  and 
sensitivity  analysis. 
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5.1  Stochastic  Hybrid  Models 


Before  introducing  the  algorithm  implemented  by  SReach  and  the  problems  that  it  can  handle,  we 
first  define  two  model  classes  that  SReach  considers  formally.  For  HAps,  we  follow  the  definition 
of  HAs  in  H 1 1711.  and  extend  it  to  consider  probabilistic  parameters  in  the  following  way. 

Definition  5.1.1  (HAP)  A  hybrid  automaton  with  parametric  uncertainty  is  a  tuple  Hp  =  ((Q,  E), 
V,  RV,  Init,  Flow,  Inv,  Jump,  £),  where 

•  The  vertices  Q  =  {q\  .  •  •  •  ,  qrn }  is  a  finite  set  of  discrete  modes,  and  edges  in  E  are  control 
switches. 

•  F  =  {ui,  •  •  •  ,  vn }  denotes  a  finite  set  of  real-valued  system  variables.  We  write  V  to  repre¬ 
sent  the  first  derivatives  of  variables  during  the  continuous  change,  and  write  V'  to  denote 
values  of  variables  at  the  conclusion  of  the  discrete  change. 

•  RV  =  { w  i ,  •  •  •  ,Wk}  is  a  finite  set  of  independent  random  variables,  where  the  distribution 
ofwi  is  denoted  by  Pi. 

•  Init,  Flow,  and  Inv  are  labeling  functions  over  Q.  For  each  mode  q  €  Q,  the  initial  condition 
I  ri'it(q)  and  invariant  condition  I  nv(q)  are  predicates  whose  free  variables  are  from  V  U  RV, 
and  the  flow  condition  Floiv(q  )  is  a  predicate  whose  free  variables  are  from  V  U  V  U  RV. 

•  Jump  A  a  transition  labeling  function  that  assigns  to  each  transition  e  G  E  a  predicate 
whose  free  variables  are  from  V  U  W  U  RV. 

•  E  is  a  finite  set  of  events,  and  an  edge  labeling  function  event  :  E  — >■  E  assigns  to  each 
control  switch  an  event. 


Another  class  is  PHArs,  which  extend  HAs  with  discrete  probability  transitions  and  additional 
randomness  for  transition  probabilities  and  variable  resets.  In  detail,  for  discrete  transitions,  instead 
of  making  a  purely  (non)deterministic  choice  over  the  set  of  currently  enabled  jumps,  a  PHA, 
(non)deterministically  chooses  among  the  set  of  recently  enabled  discrete  probability  distributions, 
each  of  which  is  defined  over  a  set  of  transitions  whose  probabilities  can  be  uncertain. 


Definition  5.1.2  (PHAr)  A  probabilistic  hybrid  automaton  with  additional  randomness  If  con¬ 
sists  of  Q,  E,  V,  RV,  Init,  Flow,  Inv,  S  as  in  Definition  5.1.1  and  Cmds,  which  is  a  finite  set  of 
probabilistic  guarded  commands  of  the  form: 

Q  “A  Pi  :  U\  +  •  •  •  +  pm  :  urn, 

where  g  is  a  predicate  representing  a  transition  guard  with  free  variables  from  V,  Pi  is  the  tran¬ 
sition  probability  for  the  ith  probabilistic  choice  which  can  be  expressed  by  an  equation  involving 
random  variable(s)  in  RV  and  the  pi ’s  satisfy  ,  p,  =  1,  and  Ui  is  the  corresponding  transition 
updating  function  for  the  ith  probabilistic  choice,  whose  free  variables  are  from  FUF'U  RV. 
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To  illustrate  the  additional  randomness  allowed  for  transition  probabilities  and  variable  resets, 
an  example  probabilistic  guarded  command  is  x  >5  — >  pi  :  (x'  =  sin(x))  +  (1  —  pi)  :  ( x’  =  px), 
where  a;  is  a  system  variable,  p\  has  a  Uniform  distribution  t/ (0.2,  0.9),  and  px  has  a  Bernoulli 
distribution  5(0.85).  This  means  that,  the  probability  to  choose  the  first  transition  is  not  a  fixed 
value,  but  a  random  one  having  a  Uniform  distribution.  Also,  after  taking  the  second  transition, 
x  can  be  assigned  to  either  1  with  probability  0.85,  or  0  with  0.15.  In  general,  for  an  individual 
probabilistic  guarded  command,  the  transition  probabilities  can  be  expressed  by  equations  of  one 
or  more  new  random  variables,  as  long  as  values  of  all  transition  probabilities  are  within  [0, 1], 
and  their  sum  is  1.  Currently,  all  four  primary  arithmetic  operations  are  supported.  Note  that,  to 
preserve  the  Markov  property,  only  unused  random  variables  can  be  used,  so  that  no  dependence 
between  the  current  probabilistic  jump  and  previous  transitions  will  be  introduced. 


5.2  The  SReach  Algorithm 

A  recently  proposed  5-complete  decision  procedure  [|94l  relaxes  the  reachability  problem  for  HAs 
in  a  sound  manner:  it  verifies  a  conservative  approximation  of  the  system  behavior,  so  that  bugs 
will  always  be  detected.  The  over-approximation  can  be  tight  (tunable  by  an  arbitrarily  small 
rational  parameter  5),  and  a  false  alarm  with  a  small  5  may  indicate  that  the  system  is  fragile, 
thereby  providing  valuable  information  to  the  system  designer.  (See  Chapter  4.2  for  more  details 
about  the  5-complete  decision  procedure.)  We  now  define  the  probabilistic  bounded  5-reachability 
problem  based  on  the  bounded  5-reachability  problem  defined  in  H 14711  . 

Definition  5.2.1  The  probabilistic  bounded  k  step  5 -reachability  for  a  HAp  Hp  is  to  compute  the 
probability  that  Hp  reaches  the  target  region  T  in  k  steps.  Given  the  set  of  independent  random 
variables  r,  Pr( r)  a  probability  measure  over  r,  and  O  the  sample  space  of  r,  the  reachability 
probability  is  f(t  Tf(r)dT)r(r),  where  /•/  ( r )  is  the  indicator  function  which  is  1  if  IIp  with  r  reaches 
T  in  k  steps. 

Definition  5.2.2  For  a  PHAr  the  probabilistic  bounded  k  step  5 -reachability  estimated  by 
SReach  is  the  maximal  probability  that  Hr  reaches  the  target  region  T  in  k  steps: 
niu/xrT(zrJ'>  r>hr  aT(?'),  where  E  is  the  set  of  possible  executions  ofH  starting  from  the  initial  state  i, 
and  a  is  an  execution  in  the  set  E. 

As  shown  in  Figure |5.f|  given  a  stochastic  hybrid  system,  after  encoding  uncertainties  using 
random  variables,  SReach  samples  them  according  to  the  given  distributions.  For  each  sample, 
a  corresponding  intermediate  HA  is  generated  by  replacing  random  variables  with  their  assigned 
values.  Then,  the  5-complete  analyzer  dReach  is  utilized  to  analyze  each  intermediate  HA  Mt, 
together  with  the  desired  precision  5  and  unfolding  depth  k.  The  analyzer  returns  either  unsat  or  5- 
sat  for  Mi.  This  information  is  then  used  by  a  chosen  statistical  testing  procedure  to  decide  whether 
to  stop  or  to  repeat  the  procedure,  and  to  return  the  estimated  probability.  The  full  procedure 
is  illustrated  in  Algorithm  3,  where  MP  is  a  given  stochastic  model,  and  ST  indicates  which 
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Algorithm  3  SReach 

1 

function  SReach(MP,  ST,  5,  k) 

2 

if  MP  is  a  HAP  then 

3 

MP  <-  EncRMi(MP) 

>  encode  uncertain  system  parameters 

4 

else 

>  otherwise  a  PHAr 

5 

MP  <-  EncRM2(MP) 

>  encode  probabilistic  jumps  and  extra  randomness 

6 

end  if 

7 

Succ,  N  0 

>  number  of  A-sat  samples  and  total  samples 

8 

Assgn  0  >  record  unique  sampling  assignments  and  dReach  results 

9 

RV  ExtractRV(MP) 

>  get  the  RVs  from  the  probabilistic  model 

10 

repeat  in  parallel 

11 

Si  <-  Sim  (RV) 

>  sample  the  parameters 

12 

if  Si  G  Assgn. sample  then 

13 

Res  Assgn(Si).res 

>  no  need  to  call  dReach 

14 

else 

15 

Mi  <—  Gen  (MP,  Si) 

>  generate  a  dReach  model 

16 

Res  dReach(Mj,  5,  k ) 

>  call  dReach  to  solve  A  - step  5- -reachability 

17 

end  if 

18 

if  Res  =  5-sat  then  Succ  Succ  +  1 

19 

end  if 

20 

N  <T-  N  +  1 

21 

until  ST.done(Succ,  N ) 

>  perform  statistical  test 

22 

return  ST.  out  put 

23 

end  function 
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Figure  5.1:  The  framework  of  SReach  algorithm 


57 


statistical  testing  method  will  be  used.  Succ  and  N  are  used  to  record  the  number  of  ci-sat  instances 
and  total  samples  generated  so  far  respectively,  and  are  then  the  inputs  of  ST.  Note  that,  for  a 
PHA,,  sampling  and  fixing  the  choices  of  all  the  probabilistic  transitions  in  advance  results  in  an 
over-approximation  of  the  original  PHAr,  where  safety  properties  are  preserved.  To  promise  a  tight 
over-approximation  and  correctness  of  estimated  probabilities,  SReach  supports  PHArs  with  no  or 
subtle  non-determinism.  That  is,  in  order  to  offer  a  reasonable  estimation,  for  PHA,s,  SReach  is 
supposed  to  be  used  on  models  with  no  or  few  non-deterministic  transitions,  or  where  dynamic 
interleaving  between  non-deterministic  and  probabilistic  choices  are  not  important. 

To  improve  the  performance  of  SReach,  each  sampled  assignment  and  its  corresponding  dReach 
result  are  recorded  for  avoiding  redundant  calls  to  dReach.  This  significantly  reduces  the  total  calls 
for  PHArs,  as  the  size  of  the  sample  space  involving  random  variables  describing  probabilistic 
jumps  is  comparatively  small.  For  the  example  PHA  (as  shown  in  Figure  [572]),  with  this  heuristic, 
the  total  checking  time  has  been  decreased  from  11291.31s  for  658  samples  (17.16s  per  sample)  to 
3295.82s  (5.01s  per  sample).  Furthermore,  a  parallel  version  of  SReach  has  been  implemented  us¬ 
ing  OpenMP,  where  multiple  samples  and  corresponding  HAs  are  generated,  and  passed  to  dReach 
simultaneously.  Using  this  parallel  SReach  on  a  4-core  machine,  the  running  time  for  the  example 
PHA  has  been  further  decreased  to  2119.55s  for  660  samples  (3.33s  per  sample). 

Currently,  SReach  supports  a  number  of  hypothesis  testing  methods  -  Lai’s  test  H151H.  Bayes 
factor  test  H139H.  Bayes  factor  test  with  indifference  region  1122911.  and  Sequential  probability 
ratio  test  (SPRT)H215I,  and  statistical  estimation  techniques  -  Chemoff-Hoeffding  bound  H121L 
Bayesian  Interval  Estimation  with  Beta  priorll2341l.  and  Direct  Sampling.  All  methods,  as  listed  in 
the  following,  produce  answers  that  are  correct  up  to  a  precision  that  can  be  set  arbitrarily  by  the 
user. 

Lai’s  test  lll511.  As  a  simple  class  of  sequential  tests,  it  tests  the  one-sided  composite  hypothe¬ 
ses  H0  :  0  <  0o  versus  7/|  :  0  >  01  for  the  natural  parameter  6  of  an  exponential  family  of 
distributions  under  the  0  —  1  loss  and  cost  c  per  observation.  115111  shows  that  these  tests  have 
nearly  optimal  frequentist  properties  and  also  provide  approximate  Bayes  solutions  with  respect  to 
a  large  class  of  priors. 

Bayes  factor  test  lll39il.  The  use  of  Bayes  factors  is  a  Bayesian  alternative  to  classical  hypothe¬ 
sis  testing.  It  is  based  on  the  Bayes  theorem.  Hypothesis  testing  with  Bayes  factors  is  more  robust 
than  frequentist  hypothesis  testing,  as  the  Bayesian  form  avoids  model  selection  bias,  evaluates  ev¬ 
idence  in  favor  of  the  null  hypothesis,  includes  model  uncertainty,  and  allows  non-nested  models 
to  be  compared.  Also,  frequentist  significance  tests  become  biased  in  favor  of  rejecting  the  null 
hypothesis  with  sufficiently  large  sample  size. 

Bayes  factor  test  with  indifference  region.  A  hypothesis  test  has  ideal  performance  if  the  proba¬ 
bility  of  the  Type-I  error  (respectively,  Type-II  error)  is  exactly  a  (respectively,  f3).  However,  these 
requirements  make  it  impossible  to  ensure  a  low  probability  for  both  types  of  errors  simultane¬ 
ously  (see  H2291  for  details).  A  solution  is  to  use  an  indifference  region.  The  indifference  region 
indicates  the  distance  between  two  hypotheses,  which  is  set  to  separate  the  two  hypotheses. 

Sequential  probability  ratio  test  (SPRT)  [j21 51.  The  SPRT  considers  a  simple  hypothesis  H0  : 
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(x  <=  1000) 


Figure  5.2:  An  example  probabilistic  hybrid  automaton 

9  =  9 o  against  a  simple  alternative  II\  :  9  =  9\.  With  the  critical  region  An  and  two  thresholds 
A,  and  B,  SPRT  decides  that  H0  is  true  and  stops  when  An  <  A.  It  decides  that  Hi  is  true  and 
terminates  if  An  >  B.  If  A  <  An  <  B,  it  will  collect  another  observation  to  obtain  a  new  critical 
region  An+i.  The  SPRT  is  optimal,  among  all  sequential  tests,  in  the  sense  that  it  minimizes  the 
average  sample  size. 

Chernoff-Hoejfding  bound  fll21|.  To  estimate  the  mean  p  of  a  (bounded)  random  variable, 
given  a  precision  S'  and  coverage  probability  a,  the  Chernoff-Hoeffding  bound  computes  a  value 
p'  such  that  | p'  —  p\  <  5'  with  probability  at  least  a. 

Bayesian  Interval  Estimation  with  Beta  prior  [(23411.  This  method  estimates  p,  the  unknown 
probability  that  a  random  sampled  model  satisfies  a  specified  reachability  property.  The  estimate 
will  be  in  the  form  of  a  confidence  interval,  containing  p  with  an  arbitrary  high  probability.  H234M 
assumes  that  the  unknown  p  is  given  by  a  random  variable,  whose  density  is  called  the  prior  density, 
and  focuses  on  Beta  priors. 

Direct  sampling.  Given  N  as  the  number  of  samples  to  be  sampled,  the  direct  sampling  method 
estimates  the  mean  of  p  of  a  (bounded)  random  variable.  According  to  the  central  limit  theorem 
m,  the  error  e  with  a  confidence  c  between  the  real  probability  p  and  the  estimated  p  is  bounded: 

where  4>(x)  =  -7=  J  'j:  e_t  //‘2dt.  That  is,  as  N  goes  to  oo,  the  estimated  probability  approaches  to 
the  real  one. 

With  these  hypothesis  testing  methods,  SReach  can  answer  qualitative  questions,  such  as  “Does 
the  model  satisfy  a  given  reachability  property  in  k  steps  with  probability  greater  than  a  certain 
threshold?”  With  the  above  statistical  estimation  techniques,  SReach  can  offer  answers  to  quanti¬ 
tative  problems.  For  instance,  “What  is  the  probability  that  the  model  satisfies  a  given  reachability 
property  in  k  steps?”  SReach  can  also  handle  additional  types  of  interesting  problems  by  encoding 
them  as  probabilistic  bounded  reachability  problems.  The  model  validation/falsification  problem 
with  prior  knowledge  can  be  encoded  as  a  probabilistic  bounded  reachability  question.  After  ex- 
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pressing  prior  knowledge  about  the  given  model  as  reachability  properties,  is  there  any  number  of 
steps  k  in  which  the  model  satisfies  a  given  property  with  a  desirable  probability?  If  none  exists, 
the  model  is  incorrect  regarding  the  given  prior  knowledge.  The  parameter  synthesis  problem 
can  also  be  encoded  as  a  probabilistic  k- step  reachability  problem.  Does  there  exist  a  parameter 
combination  for  which  the  model  reaches  the  given  goal  region  in  k  steps  with  a  desirable  probabil¬ 
ity?  If  so,  this  parameter  combination  is  potentially  a  good  estimation  for  the  system  parameters. 
The  goal  here  is  to  find  a  combination  with  which  all  the  given  goal  regions  can  be  reached  in  a 
bounded  number  of  steps.  Moreover,  sensitivity  analysis  can  be  conducted  by  a  set  of  probabilistic 
bounded  reachability  queries  as  well:  Are  the  results  of  reachability  analysis  the  same  for  different 
possible  values  of  a  certain  system  parameter?  If  so,  the  model  is  insensitive  to  this  parameter  with 
regard  to  the  given  prior  knowledge. 


5.3  The  SReach  Tool 

Input  Format 

The  inputs  to  our  SReach  tool  are  descriptions  of  (probabilistic)  hybrid  automata  with  ran¬ 
dom  variables  (representing  the  probabilistic  system  parameters,  and  probabilistic  jumps),  and  the 
reachability  property  to  be  checked.  Following  roughly  the  same  format  as  the  above  definition  of 
(probabilistic)  hybrid  automata,  and  adding  the  declarations  of  random  variables,  the  description 
of  an  automaton  is  as  follows. 

Preprocessor.  We  can  use  the  C  language  syntax  to  define  constants  and  macros. 

Variable  declaration.  For  a  random  variable,  the  declaration  specifies  its  distribution  and 
name.  Variables  that  are  not  random  variables  are  required  to  be  declared  within  bounds. 

(Probabilistic)  Hybrid  automaton.  A  (probabilistic)  hybrid  automaton  is  represented  by  a 
set  of  modes.  Within  each  mode  declaration,  we  can  specify  statements  for  the  mode  invariant(s), 
flow  function(s),  and  (probabilistic)  jump  condition(s).  For  a  mode  invariant,  we  can  give  any  logic 
formula  of  the  variables.  A  flow  function  is  expressed  by  an  ODE.  As  for  a  nonprobabilistic  jump 
condition,  it  is  written  as 

<logic_f ormulal>  ==>  @<target_mode>  <logic_formula2>, 

where  the  first  logic  formula  is  given  as  the  guard  of  the  jump,  and  the  second  one  specifies  the  reset 
condition  after  the  jump.  While  for  a  probabilistic  jump  condition,  we  need  an  extra  constraint  to 
express  the  stochastic  choice,  which  is  of  the  following  form 

(and  <logic_f ormulal>  <stochastic  choice>)  ==> 

@<target_mode>  <logic_f ormula2>, 

where  the  stochastic  choice  is  a  formula  indicating  which  probabilistic  transition  will  be  chosen 
for  this  jump. 
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Initial  conditions  and  Goals.  Following  the  declaration  of  modes,  we  can  declare  one  initial 
mode  with  corresponding  conditions,  and  the  reachability  properties  in  the  end. 

Example  5.1.  The  following  is  an  example  input  file  for  a  hybrid  automaton  with  parametric  un¬ 
certainty.  Currently,  users  can  specify  random  variables  (representing  certain  system  parameters) 
with  Bernoulli  distribution  (B),  Uniform  distribution  (U),  Gaussian  distribution  (N),  Exponential 
distribution  (E),  and  general  Discrete  distribution  with  given  possible  values  and  corresponding 
probabilities  (DD). 


1  tdefine  pi  3.1416 

2  N (1,0.1)  mul ; 

3  U (10, 15)  thro; 

4  E  (0 . 49)  thetal ; 

5  B  ( 0 . 7  5 )  xinit ; 


DD  (0:0.7,  1:0.3)  mu2; 


7 

[0,5]  x; 

8 

[0,3]  time ; 

9 

{  mode  1 

f 

10 

invt : 

11 

(x<=l . 5)  ; 

12 

X 

V 

II 

o 

13 

flow : 

14 

d/dt [x] =thro*  ( 1 

15 

*exp  ( 0 

16 

jump : 

17 

(x>= (threl+5) ) == 

18 

} 

19 

init : 

20 

@1 

(x=xinit ) ; 

21 

goal : 

22 

@4 

o 

LO 

II 

A 

X 

(thetal*sqrt (2*pi) ) ) 

( (x-mul+mu2) "2) / (2 *thetal ~2 ) ) 

@2 (x' =x) ; 


Example  5.2.  This  example  demonstrates  the  format  of  the  input  file  for  a  probabilistic  hybrid 
automaton  with  additional  randomness  for  transition  probabilities.  Note  that,  unlike  the  notations 
of  declarations  of  random  variables  representing  system  parameters  and  probabilistic  transitions, 
declarations  of  random  variables  used  to  express  the  additional  randomness  for  jump  probabilities 
start  with  a  prefix  j. 

1  j  U ( 0 . 7 ,  0.9)  pjumprv; 

2  DD ( 1 : pjumprv,  2: (1  -  pjumprv))  pjumpl; 

3  DD  (1:0.3,  2:0.7)  pjump2; 

4  [-1000,  1000]  x; 

5  [-1000,  1000]  y; 

6  [0,  3]  time; 

7 

8  {  mode  1; 

9 
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invt : 


10 

11  (x  <=  2 ) ; 

12  (x  >=  0 )  ; 

13  (y  <=  7.7); 

14  (y  >=  -3) ; 

15  flow: 

16  d/dt [x]  =  x  *  y; 

17  d/dt[y]  =  3  *  x  -  y; 

18  jump: 


19 

(and 

(abs (y) 

*  X 

2  <=  x  / 

2 )  ( p  j  ump 1  = 

1)  ) 

==>  @1  (and 

(x'  >= 

sin  (y) ) 

(y'  <= 

4  *  y )  )  ; 

20 

(and 

(abs (y) 

*  X 

2  <=  x  / 

2)  ( p  j  ump 1  = 

2)  ) 

==>  @2  (and 

(x'  <= 

3 

•1)  (y' 

=  2  * 

x)  )  ; 

21 

(and 

(cos (x) 

<=  0) 

(pjump2  : 

=  1) )  ==>  @2 

(and 

(x'  =  x)  (y ' 

=  y) ) 

22 

r 

(and 

(cos (x) 

<=  0) 

( p  j  ump  2  : 

=  2) )  ==>  @1 

(and 

(x'  =  x)  (y' 

=  y) ) 

23  } 

24 

25  { 

26  mode  2; 

27  invt : 

28  (x  <=  200) ; 

29  (x  >=  -2.2) ; 

30  (y  <=  85.1) ; 

31  (y  >=  2)  ; 

32  flow: 

33  d/dt[x]  =  x; 

34  d/dt[y]  =  3*x-y~2; 

35  jump: 

36  (and  (x  <=  1000)  (x  >=  -1000)  (y  <=  1000)  (y  >=  -1000))  ==>  @2 

(and  (x'  =  x)  (y'  =  y)  )  ; 

37  } 

38  init : 

39  @1  (and  (x  >=  0.1)  (x  <=  1.4)  (y  =  1.1)); 

40 

41  goal : 

42  @2  (and  (x  >=  -10)  (y  >=  -10)); 

Command  Line 

SReach  offers  two  choices.  It  can  be  run  sequentially  by  typing 

sreach_sq  <stat istical_testing_opt ion>  <filename> 

<dReach>  <k>  <delta>. 
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or  in  parallel  by 


sreach_para  <stat ist ical_test ing_option>  <filename> 

<dReach>  <k>  <delta>. 


where: 

•  statistical_testing_option  is  a  text  file  containing  a  sequence  of  test  specifica¬ 
tions.  We  will  introduce  the  usages  of  statistical  testing  options  in  the  following  part; 

•  f  i  lename  is  a  .pdrh  file  describing  the  model  of  a  hybrid  system  with  probabilistic  system 
parameters.  It  is  of  the  input  format  described  in  last  sub-section; 

•  dReach  is  a  tool  for  bounded  reachability  analysis  of  hybrid  systems  based  on  dReal; 

•  k  is  the  number  of  steps  of  the  model  that  the  tool  will  explore;  and 

•  delta  is  the  precision  for  the  5-decision  problem. 

Statistical  Testing  Options 

SRecich  can  be  used  with  different  statistical  testing  methods  through  the  following  specifica¬ 
tions. 

Lai’s  test:  Lai  <theta>  <cost_per_sample>,  where  theta  indicates  the  probability 
threshold. 

Bayes  factor  test:  BYT  <theta>  <T>  <alpha>  <be  t  a  >,  where  theta  is  a  probability 
threshold  satisfying  0  <  theta  <  1,  T  is  a  ratio  threshold  satisfying  T  >  1,  and  alpha,  and 
beta  are  beta  prior  parameters. 

BFT with  indifference  region:  BFTI  <theta>  <T>  <alpha>  <beta>  <delta>, 
where,  besides  the  parameters  used  in  the  above  Bayes  factor  test,  delta  is  given  to  create  the 
indifference  region  -  [p0,  pi\,  where  p0  =  theta  -  delta  and  p\  =  theta  +  delta.  Now,  it 
tests  H0  :  p  >  p0  against  H1  :  p  <  pi  . 

Sequential  probability  ratio  test  (SPRT):  SPRT  <theta>  <T>  <delta>. 

Chernoff-Hoeffding  bound:  CHB  <deltal>  <coverage_probability>,  where 
deltal  is  the  given  precision,  and  coverage_probability  indicates  the  confidence. 

Bayesian  Interval  Estimation  with  Beta  prior: 

BEST  <deltal>  <coverage_probabi 1 ity>  <alpha>  <beta>. 

Direct/Naive  Sampling:  NSAM  <num_of_samples>. 

Both  sequential  and  parallel  versions  of  SReach  are  available  on  https  :  //github  .  com/ 
dreal/SReach. 
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5.4 


Case  Studies 


5.4.1  Atrial  Fibrillation 


The  heart  rhythm  is  enabled  by  the  electrical  activity  of  cardiac  muscle  cells,  which  make  up  the 
atria  and  ventricles.  The  electrical  dynamics  of  cardiac  cells  is  governed  by  the  organized  opening 
and  closing  of  ion  channel  gates  on  the  cell  membrane.  Improper  functioning  of  the  cardiac  cell 
ionic  channels  can  cause  the  cells  to  lose  excitability,  which  disorders  electric  wave  propagation 
and  leads  to  cardiac  abnormalities  such  as  ventricular  tachycardia  ox  fibrillation.  Mathematical 
modeling  the  dynamics  of  cardiac  cells  is  important  in  understanding  the  mechanisms  of  cardiac 
disorders.  Il49l  has  developed  an  extremely  versatile  electrical  model  for  cardiac  cells,  referred 
as  minimum  resistor  model  (MRM),  which  reproduces  experimentally  measured  characteristics  of 
human  ventricular  cell  dynamics. 

MRM  (see  Figure  [53])  contains  4  state  variables  and  26  parameters.  An  action  potential  (AP) 
is  a  change  in  the  cells  transmembrane  potential  u,  as  a  response  to  an  external  stimulus  (current) 
e.  The  flow  of  total  currents  is  controlled  by  a  fast  channel  gate  v  and  two  slow  gates  w  and  s. 
In  Mode  1,  gates  v  and  w  are  open  and  gate  s  is  closed.  The  transmembrane  potassium  current 
causes  the  decay  of  u.  The  cell  is  resting  and  waiting  for  stimulation.  We  assume  external  stimulus 
e  equals  to  1  and  lasts  for  1  millisecond.  The  stimulation  causes  u  increase  which  may  trigger 
jump1^2  :  u  >  0o.  In  Mode  2,  v  starts  closing.  The  decay  rate  of  u  changes.  The  systems  will 
jump  to  Mode  3  if  u  >  9W.  In  Mode  3,  w  is  also  closing,  u  is  governed  by  the  potassium  current 
and  the  calcium  current.  When  u  >  6V,  Mode  4  can  be  reached  which  means  a  successful  AP 
initiation.  In  Mode  4,  u  reaches  its  peak  due  to  the  fast  opening  of  sodium  channel.  The  cardiac 
muscle  contracts  and  u  starts  decreasing. 

MRM  reduces  the  complexity  of  existing  models  by  representing  channel  gates  of  different 
ions  with  one  fast  channel  and  two  slow  gates.  Identifying  the  parameter  ranges  for  which  the 
MRM  accurately  reproduces  cardiac  abnormalities  will  benefit  the  development  of  the  treatment 
of  cardiac  disorders.  However,  due  to  the  simplification,  for  most  model  parameters,  it  becomes 
impossible  to  obtain  their  values  through  measurements.  After  adding  parametric  uncertainty  into 
the  original  hybrid  model,  we  show  that  SReach  can  be  adapted  to  synthesize  parameters  for  this 
stochastic  model,  i.e.,  identifying  appropriate  ranges  and  distributions  for  model  parameters.  We 
chose  two  system  parameters  -  EPITO 1  and  EPIT02,  and  varied  their  distributions  to  see  which 
ones  allow  the  model  to  present  the  desired  patterns.  As  in  Table |5.l[  when  EPITO  1  is  either  close 
to  400,  or  between  0.0061  and  0.007,  and  EPIT02  is  close  to  6,  the  model  can  satisfy  the  given 
bounded  reachability  property  with  a  probability  very  close  to  1.  The  analysis  for  this  model  was 
conducted  on  a  server  with  2*  AMD  Opteron(tm)  Processor  6172  and  32GB  RAM  (12  cores  were 
used),  running  on  Ubuntu  14.04.1  LTS.  In  our  experiments,  we  used  0.001  as  the  precision  for  the 
5-decision  problem,  and  Bayesian  sequential  estimation  with  0.01  as  the  estimation  error  bound, 
coverage  probability  0.99,  and  a  uniform  prior  {a  —  f3  —  1). 
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Mode  1  Mode  2  Mode  3 


Mode  4 


flow. 

t  * 

du 

c  ,  v(k  — 0V )(«„—«) 

dt 

rfi 

1 

1  ; 

+ 

1 

w 

■  2(1  + 

dv 

V 

dt 

<2 

dw 

w 

dt 

T  l 

Figure  5.3:  The  minimal  resistor  model  of  cardiac  cells 


Model 

#RVs 

EPI  TO  1 

EPI  T02 

#S  S 

#T S 

Est_P 

A  T(s) 

T T(s) 

Cd  to  1 s 

1 

U(6.1e-3,  7e-3) 

6 

240 

240 

0.996 

0.270 

64.80 

Cd  to  1  uns 

1 

U(5.5e-3,  5.9e-3) 

6 

0 

240 

0.004 

0.042 

10.08 

Cd to2 s 

1 

400 

U(0.131,  6) 

240 

240 

0.996 

0.231 

55.36 

Cd to2 uns 

1 

400 

U(0. 1,0.129) 

0 

240 

0.004 

0.038 

9.15 

Cd_tol2_s 

2 

N(400,  le-4) 

N(6,  le-4) 

240 

240 

0.996 

0.091 

21.87 

Cd_tol2_uns 

2 

N(5.5e-3,  10e-6) 

N(0.11,  10e-5) 

0 

240 

0.004 

0.037 

8.90 

Table  5.1:  Results  for  the  4-mode  atrial  fibrillation  model  ( k  =  3).  For  each  sample  generated,  SReach  analyzed  systems  with  62  variables  and  24 
ODEs  in  the  unfolded  SMT  formulae.  #RVs  =  number  of  random  variables  in  the  model,  #S_S  =  number  of  5-sat  samples,  #T_S  =  total  number 
of  samples,  Est_P  =  estimated  probability  of  property,  A_T(s)  =  average  CPU  time  of  each  sample  in  seconds,  and  T_T(s)  =  total  CPU  time  for  all 
samples  in  seconds.  Note  that,  we  use  the  same  notations  in  the  remaining  tables. 


5.4.2  Prostate  Cancer  Treatment 

Prostate  cancer  is  the  second  leading  cause  of  cancer-related  deaths  among  men  in  United  States 
11951.  Hormone  therapy  in  the  form  of  androgen  deprivation  has  been  a  cornerstone  of  the  man¬ 
agement  of  advanced  prostate  cancer  for  several  decades.  However,  controversy  remains  regarding 
its  optimum  application  ||47 1.  Continuous  androgen  suppression  (CAS)  therapy  has  many  side  ef¬ 
fects  including  anemia,  osteoporosis,  impotence,  etc.  Further,  most  patients  experience  a  relapse 
after  a  median  duration  of  18-24  months  of  CAS  treatment,  due  to  the  proliferation  of  castration 
resistant  cancer  cells  (CRCs). 

In  order  to  reduce  side  effects  of  CAS  and  to  delay  the  time  to  relapse,  intermittent  andro¬ 
gen  suppression  (IAS)  was  proposed  aiming  to  limit  the  duration  of  androgen-poor  conditions  and 
avoid  emergence  of  AI  cells  [|4T||.  In  details,  IAS  therapy  switches  between  on-treatment  and  off- 
treatment  modes  by  monitoring  the  serum  level  of  a  tumor  marker  called  prostate-specific  antigen 
(PSA):  (i)  when  the  PSA  level  decreases  and  reaches  a  lower  threshold  value  r0,  androgen  sup¬ 
pression  is  suspended;  (ii)  when  the  PSA  level  increases  and  reaches  a  upper  threshold  value  n, 
androgen  suppression  is  resumed  by  the  administration  of  medical  agents. 

Recent  clinical  phase  II  and  III  trials  confirm  that  IAS  has  significant  advantages  in  terms  of 


65 


quality  of  life  and  cost.  However,  with  respect  to  time  to  relapse  and  cancer-specific  survival,  the 
clinical  trials  suggest  that  to  what  extent  IAS  is  superior  to  CAS  depends  on  the  individual  patient 
and  the  on-  and  off-treatment  scheme  [42,  43,  1091.  Thus,  a  crucial  unsolved  problem  is  how 
to  design  a  personalized  treatment  scheme  for  each  individual  to  achieve  maximum  therapeutic 
efficacy. 

In  H154H.  Liu  et  al.  recently  proposed  a  nonlinear  hybrid  model  to  reproduce  the  clinical  obser¬ 
vations  11421(431  of  prostate  cancer  cell  dynamics  in  response  to  the  IAS  therapy.  It  is  known  that  the 
proliferation  and  survival  of  prostate  cancer  cells  depend  on  the  levels  of  androgens,  specifically 
testosterone  and  5 a - d i h y d ro tc s to s tcro n c  (DHT).  This  model  considers  two  distinct  subpopulations 
of  prostate  cancer  cells:  hormone  sensitive  cells  (HSCs)  and  castration  resistant  cells  (CRCs). 
Androgen  deprivation  can  lead  to  remarkable  decreases  of  the  proliferation  and  survival  rates  of 
HSCs,  but  also  up-regulates  the  conversion  from  HSCs  to  CRCs,  which  will  keep  proliferating 
under  low  androgen  level. 

The  model  has  two  modes  which  are  shown  in  Figure  [5~4{  x(t),  y(t),  and  z(t)  represent  the 
population  of  AD  cells,  the  population  of  AI  cells,  and  the  serum  androgen  concentration,  respec¬ 
tively.  The  growth  dynamics  of  AD  and  AI  cells  are  governed  by  their  proliferation  rate,  apoptosis 
rate  and  mutation  rate  from  AD  to  AI  phenotype,  depending  on  androgen  concentration  z(t).  The 
PSA  level  v  (ng  ml-1)  is  defined  as  v(t)  =  x(t )  +  y(t).  The  treatment  is  suspended  or  restarted 
according  to  the  value  of  v  and  dv/dt.  In  mode  2  (off-treatment),  the  androgen  concentration 
is  maintained  at  the  normal  level  Zq  by  homeostasis.  In  mode  1  (on-treatment),  the  androgen  is 
cleared  at  a  rate  1/r. 

This  two-mode  model  captures  the  intermittent  androgen  suppression  (IAS)  therapy  that  switches 
between  treatment  ON  and  OFF  according  to  the  serum  level  thresholds  of  prostate-specific  anti¬ 
gen,  namely  r0  and  r\.  As  suggested  by  the  clinical  trials  [144-11 .  an  effective  IAS  therapy  highly 
depends  on  the  individual  patient.  Thus,  we  modified  this  two-mode  model  by  taking  paramet¬ 
ric  variation  caused  by  personalized  differences  into  account.  In  detail,  according  to  clinical  data 
from  hundreds  of  patients  li45l.  we  replaced  six  system  parameters  with  random  variables  having 
appropriate  (continuous)  distributions,  including  ax  (the  proliferation  rate  of  androgen-dependent 
(AD)  cells),  ay  (the  proliferation  rate  of  androgen-independent  (AI)  cells),  i3x  (the  apoptosis  rate 
of  AD  cells),  /3y  (the  apoptosis  rate  of  AI  cells),  rn  j  (the  mutation  rate  from  AD  to  AI  cells),  and 
z()  (the  normal  androgen  level).  To  describe  the  variations  due  to  individual  differences,  we  as¬ 
signed  a*  to  be  (7(0.0193, 0.0214),  ay  to  be  (7(0.0230,  0.0254),  (5X  to  be  (7( 0.0072,  0.0079),  ,6y  to 
be  (7(0.0160,  0.0176),  m i  to  be  (7(0.0000475,  0.0000525),  and  z0  to  be  N( 30.0,  0.001).  We  used 
SReach  to  estimate  the  probabilities  of  preventing  the  relapse  of  prostate  cancer  with  three  distinct 
pairs  of  treatment  thresholds  ( i.e .,  combinations  of  r0  and  r\).  As  shown  in  Table [572}  the  model 
with  thresholds  r0  =  10  and  r1  =  15  has  a  maximum  posterior  probability  that  approaches  1, 
indicating  that  these  thresholds  may  be  considered  for  the  general  treatment.  The  experiment  for 
this  stochastic  model  was  conducted  on  a  server  with  2*  AMD  Opteron(tm)  Processor  6172  and 
32GB  RAM  (12  cores  were  used),  running  on  Ubuntu  14.04.1  LTS.  In  our  experiments  we  used 
0.001  as  the  precision  for  the  5-decision  problem,  and  Bayesian  sequential  estimation  with  0.01  as 
the  estimation  error  bound,  coverage  probability  0.99,  and  a  uniform  prior  (a  =  (3  =  1). 
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jump, :  jump,^, : 

dx  dy  dx  dv  n 

x  +  yzr,  A  —  +  —  <Ovwaf  x+vzr.  A  —  +  —  >0 

dt  dt  dt  dt 


Figure  5.4:  A  hybrid  automaton  model  for  prostate  cancer  hormone  therapy 


Model 

#RVs 

^0 

T\ 

Est_P 

#S S 

#T S 

A T(s) 

T T(s) 

PCT1 

6 

5.0 

10.0 

0.496 

8226 

16584 

0.596 

9892 

PCT2 

6 

7.0 

11.0 

0.994 

335 

336 

54.307 

18247 

PCT3 

6 

10.0 

15.0 

0.996 

240 

240 

506.5 

121560 

Table  5.2:  Results  for  the  2-mode  prostate  cancer  treatment  model  (k  =  2).  For  each  sample  generated,  SReach  analyzed  systems  with  41  variables 
and  10  ODEs  in  the  unfolded  SMT  formulae. 


5.4.3  Tap  Withdrawal  Circuit  in  C.  elegans 

Note.  This  case  study  is  led  by  Md.  Ariful  Islam. 

Due  to  the  simplicity  of  its  nervous  system  (302  neurons,  ~5,000  synapses)  and  the  breadth  of 
research  on  the  animal,  C.  elegans,  the  common  roundworm,  is  a  model  system  for  neuroscience. 
The  complete  connectome  of  the  worm  is  documented  [40,  2211,  and  a  number  of  interesting 
experiments  have  been  carried  out  on  its  locomotory  neural  circuits  connecting  sensory  neurons 
to  motor  neurons  ifTTl  11021  [14Q[|233M.  Of  particular  interest  is  the  Tap  Withdrawal  (TW)  neuronal 
circuit  that  governs  the  reactionary  motion  of  the  animal  when  the  petri  dish  in  which  it  swims  is 
subjected  to  a  mechanical  tap  H134H.  (A  related  circuit,  touch  sensitivity,  controls  the  reaction  of 


67 


the  worm  when  a  stimulus  is  applied  to  a  single  point  on  the  body.)  The  term  ’’tap  withdrawal” 
refers  to  the  fact  that  worms  swimming  in  a  petri  dish  tend  to  withdraw  (turn  around  and  swim  in 
the  opposite  direction)  when  subjected  to  a  tap  stimulus.  Presumably,  this  is  because  the  tap  causes 
them  to  sense  danger  in  their  surrounding  environment.  The  worms,  however,  can  be  conditioned 
or  habituated  to  ignore  this  stimulus  H 1 841 . 


Studies  of  the  TW  circuit  have  traditionally  involved  using  lasers  to  ablate  different  neurons 
in  the  circuit  of  multiple  animals,  and  then  measuring  the  response  behavior  when  tap  stimuli 
are  applied  11241.  Such  is  the  case  for  If2221 :  see  also  Fig.  |5.6|  Such  behaviors  are  logged  with 
the  percentage  of  the  experimental  population  to  display  that  behavior.  Moreover,  with  the  aim 
of  predicting  synaptic  polarities  (unknown  parameters)  of  the  TW  circuit,  the  dynamics  of  the 
membrane  potential  of  different  neurons  has  been  mathematically  modeled  ll223il .  This  model 
is  in  the  form  of  a  system  of  nonlinear  ODEs  with  an  indication  of  the  polarity  (inhibitory  or 
excitatory)  of  each  neuron  in  the  circuit.  The  Wicks  et  al.  circuit  model  has  a  significant  number  of 
parameters,  including  gap-junction  conductance,  membrane  capacitance  and  leakage  current,  that 
decisively  affect  the  circuit’s  behavior.  Fixed  values  for  these  parameters  have  been  provided  based 
on  the  measurements  performed  on  single  in-vitro  neurons  [12231.  The  model  therefore  produces  the 
predominant  behavior  in  most  ablation  groups  with  a  few  exceptions.  While  the  experimental  work 
and  the  model  presented  in  were  by  no  means  insubstantial,  the  exploration  of  the  model  is  vastly 
incomplete.  Fixed  parameter  values  fit  through  experimentation  cause  the  model  to  replicate  the 
predominant  behavior  seen  in  the  mentioned  experiments,  but  little  can  be  gained  beyond  that.  All 
such  animals  are  not  created  equal  owing  to  genetic  variation,  and,  during  their  lifetime,  they  are 
exposed  to  stimuli  of  varying  intensity,  duration,  and  frequency.  Carefully  and  (semi-)exhaustively 
varying  the  circuit  parameters  of  the  H2231  model  should  provide  us  with  insights  underlying  these 
processes,  and  ultimately  help  us  to  understand  the  learning  process  in  neural  circuits. 


Towards  this  end,  we  use  SReach  to  perform  the  bounded-time  reachability  analysis  on  the  TW 
circuit  model  of  Wicks  et  al.  (1996)  to  estimate  the  probability  of  various  TW  responses  related 
to  parameter  uncertainty,  and  thus  to  derive  population  percentages  that  exhibit  various  behaviors 
in  response  to  tap  stimuli.  This  case  study  shows  that  SReach  can  handle  large-scale  systems  for 
which  traditional  reachability  analysis  may  not  scale. 


Tap  Withdrawal  Neuronal  Circuit  in  C.  elegans 

In  C.  elegans,  there  are  three  classes  of  neurons:  sensory,  inter,  and  motor.  For  the  TW  circuit, 
the  sensory  neurons  are  PLM,  PVD,  ALM,  and  AVM,  and  the  inter-neurons  are  AVI).  DVA,  PVC, 
AS/A,  and  AVB.  The  model  we  are  using  abstracts  away  the  motor  neurons  as  simply  forward  and  re¬ 
verse  movement.  Neurons  are  connected  in  two  ways:  electrically  via  bi-directional  gap  junctions, 
and  chemically  via  uni-directional  chemical  synapses.  Each  connection  has  varying  degrees  of 
throughput,  and  each  neuron  can  be  excitatory  or  inhibitory,  governing  the  polarity  of  transmitted 
signals.  These  polarities  were  experimentally  determined  in  H223H.  and  used  to  produce  the  circuit 
shown  in  Fig.  5.5  In  H222L  Wicks  et  al.  performed  a  series  of  laser  ablation  experiments  in  which 
they  knocked  out  neurons  in  a  group  of  animals  (worms),  subjected  them  to  a  tapped  surface,  and 
recorded  the  magnitude  and  direction  of  the  resulting  behavior.  Fig.  |5.6|  shows  the  response  types 
for  each  of  their  experiments. 
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Figure  5 .5 1  Tap  Withdrawal  Circuit  of  C.  elegans.  Rectangle:  Sensory  Neurons;  Circle:  Inter-neurons;  Dashed  Undirected  Edge:  Gap  Junction; 
Solid  Directed  Edge:  Chemical  Synapse;  Edge  Label:  Number  of  Connections  ED;  Dark  Gray:  Excitatory  Neuron;  Light  Gray:  Inhibitory  Neuron; 
White:  Unknown  Polarity.  FWD:  Forward  Motor  system;  REV:  Reverse  Motor  System. 
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Figure  5.6:  Effect  of  ablation  on  Tap  Withdrawal  reflex  (experimental  results).  The  length  of  the  bars  indicate  the  fraction  of  the  population 
demonstrating  the  particular  behavior.  (2S) 


Mathematical  Model  of  Tap  Withdrawal  Neuronal  Circuit 

The  dynamics  of  a  neuron’s  membrane  potential,  V,  is  determined  by  the  internal  state  of  the 
neuron  together  with  sum  of  all  input  currents  lll45il.  written  as: 


dV 

dt 


1  (yleak  -V)  +  ^J2(J 


gap 


CR 


C 


|  jsyn  _|_  jstim ^ 


where  V  represents  the  membrane  potential,  C  is  the  membrane  capacitance,  R  is  the  membrane 
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resistance ,  Vleak  is  the  leakage  potential,  I!iap  and  Isyn  are  gap-junction  and  the  chemical  synapse 
currents,  respectively,  and  Istim  is  the  applied  external  stimulus  current.  The  summations  are  over 
all  neurons  with  which  this  neuron  has  a  (gap-junction  or  synaptic)  connection. 

The  current  flows  between  neuron  i  and  j  via  n,3  gap-junctions  can  be  seen  as  the  current 
passing  through  n  parallel  resistors.  Therefore,  based  on  the  Ohm’s  law,  one  can  derive  the  gap- 
junction  current  equation  as  follows: 


Ifr  =  -  Vi) 

where  the  constant  g^p  is  the  maximum  conductance  of  the  gap  junction,  and  ntJ  is  the  number  of 
gap-junctions  between  neurons  i  and  j.  The  conductance  g~£p  defines  the  strength  of  a  connection 
between  two  neurons.  As  a  consequence,  it  sets  the  amount  of  shared  information  of  the  two 
neurons.  This  key  parameter  significantly  affects  the  behavior  of  the  neural  circuits. 

Chemical  synapses  transfer  information  by  releasing  neurotransmitters  1113511 .  Inspired  by 
Hodgkin-Huxley  model  of  ionic  channels  Ill20ll.  one  can  model  such  behavior  as  a  synaptic  current 
flowing  from  presynaptic  neuron  j  to  post-synaptic  neuron  i  as  below: 


Isyn  =  n 

ij 


syngsyn{t)(Ej 


Vi) 


where  gff'it)  is  the  voltage-dependent  synaptic  conductance  of  neuron  i,  riy  "  is  the  number  of 
synaptic  connections  from  neuron  j  to  neuron  i,  and  Et  is  the  reversal  potential  of  neuron  j  for 
the  synaptic  conductance. 

The  chemical  synapse  is  characterized  by  a  synaptic  sign,  or  polarity,  specifying  if  said  synapse 
is  excitatory  or  inhibitory.  The  value  of  Ej  is  assumed  to  be  constant  for  the  same  synaptic  sign. 

For  a  neuron  of  C.  elegans  at  equilibrium,  the  membrane  potential  on  average  is  around  -30m V. 
According  to  the  Eq.  5.4.3[  by  setting  the  reversal  potential  value  to  a  higher  values  than  the  resting 
potential  of  a  neuron,  the  synaptic  current  increases  and  therefore  an  excitatory  behavior  is  realized. 
On  the  contrary,  an  inhibitory  synapse  is  developed  by  placing  the  value  of  the  reversal  potential 
less  than  the  equilibrium  potential  of  the  neuron. 

Dynamics  of  the  Synaptic  conductance  depends  on  the  membrane  potential  state  of  the  presy¬ 
naptic  neuron  Vj  .  For  the  sake  of  simplicity,  Wicks  et  al.  model  such  dynamics  by  the  steady-state 
response  of  the  synapse  as  follows 


9 1 


syn(t )  =  g5T(Vj) 


where  the  conductance  at  steady- state  is  given  by: 


9T(Vj)  = 


syn 

Um 


V  —  Veq 

I  < ‘X p  (  /i'  ^  .  ) 


Range 


gspn  presents  the  maximum  synaptic  conductance,  V' q  is  the  pre-synaptic  equilibrium  potential, 
and  VRarige  is  the  pre-synaptic  voltage  range  over  which  the  synapse  is  activated,  k  is  an  experi¬ 
mentally  derived  constant,  valued  at  -4.3944. 
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Combining  all  of  the  above  pieces,  the  mathematical  model  of  the  TW  circuit  is  a  system  of 
nonlinear  ODEs,  with  each  state  variable  defined  as  the  membrane  potential  of  a  neuron  in  the 
neural  circuit.  Consider  a  circuit  with  N  neurons.  The  dynamics  of  the  ith  neuron  of  the  circuit  is 
given  by: 


dVi. 

dt 


Vu-Vi 

CiRi 


N 


+ + i-r + /, 

3= 1 


stim\ 


(5.1) 
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iT  =  nTgT&i  -  Vi) 


(5.3) 
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(5.4) 


The  equilibrium  potentials  ( Veq )  of  the  neurons  are  computed  by  setting  the  left-hand  side  of 
Eq.  (|5.1[)  to  zero  [[2231 .  This  leads  to  a  system  of  linear  equations,  that  can  be  solved  as  follows: 


where  matrix  A  is  given  by: 

Ain  = 

and  vector  b  is  written  as: 


Veq  =  A~  b 


- RinijP9mP  if  *  +  3 

13  '  1  +  Ri  Ef=i  n^g^gTf  2  ifi  =  3 


N 

ii=Vll  +  Rmi  Y,  EjU^sT/ 2. 

3= 1 


(5.5) 


Tap  Withdrawal  Response  Patterns 

The  Wicks  et  al.  model  does  not  explicitly  incorporate  nematode  locomotion.  It  simply  defines 
the  relationship  between  the  animals  locomotion  and  activation  of  the  TW  circuit  that  controls  the 
behavior. 

Wicks  et  al.  assumes  that  the  output  of  the  TW  circuit  controls  locomotory  behavior  primarily 
through  the  action  of  the  inter-neurons  AVB  and  AVA.  The  AVA  intemeurons  make  gap  junctions 
and  chemical  synapses  with  motor  neurons  AS,  VA,  and  DA  that  excite  backward  locomotion, 
whereas  the  the  AVB  intemeurons  form  gap  junctions  with  the  motor  neurons  VB  and  DB  that  ex¬ 
cite  forward  locomotion.  Thus,  Wicks  et  al.  simply  assume  that  the  degree  of  backward  (forward) 
locomotion  is  proportional  to  the  depolarization  of  the  AVA  (AVB)  interneuron  and  inversely  pro¬ 
portional  to  the  depolarization  of  the  AVB  (AVA)  interneuron.  Recently,  Kawano  et  al.  present  a 
study  in  [11431  that  supports  the  assumptions  made  by  Wicks  et  al.  on  directional  movement  of 
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-0.02 


Figure  5.7:  TBD. 


C.  elegans.  Through  in-vivo  calcium  imaging,  electrophysiology  and  behavioral  analyses  of  wild- 
type  animals  and  innexin  mutants,  they  show  that  the  initiation  of  reversal  movement  is  directly 
correlated  with  a  increased  calcium  level  in  AVA.  In  contrast,  the  initiation  of  forward  movement 
is  associated  with  an  increased  calcium  level  in  AVB  and  a  decrease  of  the  calcium  transient  cor¬ 
related  with  either  a  reduced  forward  velocity  or  reversals. 

Under  standard  laboratory  culture  conditions,  the  animal  predominantly  generates  continuous 
forward  movement  without  any  tap  stimulation  [|  1 041 1 17411.  In  H222L  Wicks  et  al.  experienced, 
through  in  vivo  experiment,  only  three  tap  withdrawal  responses:  reversal,  acceleration  and  no 
response.  The  simulation  of  the  Wicks  et  al.  model  for  certain  ablation  group  (e.g.,  AVM,PVC- 
ablation  group),  however,  shows  the  animal  can  also  predominantly  generates  continuous  backward 
movement  without  any  tap  stimulation.  Additionally,  the  study  of  Kawano  et  al.  supports  the 
evidence  of  new  type  of  response  like  deceleration  (reduced  forward  velocity).  These  lead  us  to 
believe  that,  at  least  in  theory,  it  is  possible  to  have  few  more  tap  withdrawal  responses  as  compared 
to  what  Wicks  et  al.  experienced  in  their  wet-lab  experiments. 

Similar  to  [  1431 12231,  we  consider  the  directional  movement  can  be  inferred  based  on  the 
voltage  difference  between  AVA  and  AVB  interneurons: 


•  Forward  movement:  vavb  >  vava 

•  Backward  movement:  vavb  <  vava 

•  No  movement:  vavb  ~  vava 


Assume  that  a*  =  v\VB  —  v'AVA,  i  E  {1,  2}  be  the  voltage  differences  between  AVB  and  AVA 
interneurons  during  non-stimulation  and  stimulation  period,  respectively,  as  shown  in  Fig.  5.7  and 
e  is  some  small  positive  number.  Based  cy,  we  categorize  the  TW  responses  into  two  subgroups: 


•  When  <7i  >=  0: 

1.  Reversal:  cr2  <=  — e 
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V  (mV)  V  (mV) 


t  (ms) 


(a)  Reversal 


t  (ms) 

(c)  Forward  acceleration 


t  (ms) 


(b)  No  response 


t  (ms) 

(d)  Forward  deceleration 


Figure  5.8:  Different  tap  withdrawal  responses  when,  before  the  applying  tap  stimulation,  the 
animal  moves  in  forward  direction. 
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2.  No  response:  |cr2|  <=  e 

3.  Forward  acceleration:  (cr2  >=  e)  A  (cr2  >=  cr\ ) 

4.  Forward  deceleration:  (<r2  >=  e)  A  (cr2  <  cy) 

•  When  <j i  <  0: 

1.  Forward:  cr2  >=  e 

2.  No  response:  |cr2|  <=  e 

3.  Backward  acceleration:  (a2  <=  — e)  A  (cr2  <=  oy ) 

4.  Backward  deceleration:  (<r2  >=  e)  A  (<r2  >  a i) 


Fig- 15  ll]  shows  all  four  response  patterns  for  the  first  subgroup.  The  response  patterns  for  the 
second  subgroup,  however,  will  have  the  similar  structures,  if  we  interchange  AVB  with  AVA  in 
the  figure. 


Normalization  of  the  Wicks  et  al.  Model 


SReach  internally  uses  dReach  H147L  which  relies  on  numerical  computation.  As  the  the  values 
of  the  parameters  in  the  Wicks  et  al.  model  are  in  the  order  of  10-9  to  10  l2.  the  computation 
often  suffers  from  numerical  instability.  To  take  into  account  this  issue,  we  normalize  the  Wicks 
et  al.  model  with  respect  to  the  capacitance,  which  is  a  common  practice  in  modeling  biological 
systems  11871.  The  values  of  the  parameter  in  this  normalized  model  are  in  the  order  of  10  tO  103. 


To  normalize,  we  combine  Eqs.(5.1 )  to  (5.4): 
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Now  letting  g-'ak  =  -jggg,  g9i  p  =  s^~,  gt  =  and  /■'  =  the  normalized  system 

dynamics  can  be  written  as: 


N 


3= 1 

Hybrid  automaton  for  TW  circuit  MTW: 

For  the  TW  circuit,  Wicks  et  al.  model  the  tap  stimulus  as  a  phasic  current  that  is  applied 
to  sensory  neurons  (AVM,  ALM  and  PLM)  simultaneously.  The  phasic  current  is,  typically,  a 
square-wave  signal  with  a  fixed  duration.  Due  to  the  piece-wise  continuous  nature  of  this  signal, 
we  represent  Wicks  et  al.  model  as  a  hybrid  automaton  by  dividing  the  dynamics  into  stimulus 
and  non-stimulus  modes.  Additionally,  when  a  tap  is  applied  to  the  worm,  it  is  assumed  that  the 
worm  is  operating  in  a  stable  condition.  To  take  this  into  account,  we  apply  the  stimulation  after  a 


V,  =  g‘rk(vh  -  V, )  +  sr  2>r(  vi  -  Vi)  +  a 


syn 


-  vo 


N 

S  1  +  eXP  (fc&) 


+  I, 


ext 


(5.6) 


Range 


74 


Mode  1 


Mode  2 


Mode  3 


Figure  5.9:  The  3-mode  hybrid  automaton  MTW  for  the  Wicks  et  al.  model 


transition  period.  Assume  that  [0,  r*]  is  the  transition  period,  [r*,  r/]  is  the  stimulation  period  and 
[0,  ts]  is  the  total  simulation  duration.  Fig.  5.9  shows  the  hybrid  automaton  Mtw  for  the  Wicks 


et  al.  model.  The  subscript  i  and  j  in  the  figure  are  used  to  denote  the  sensory  neurons  and  the 
interneurons,  respectively.  We  add  an  additional  variable  r  to  support  time-triggered  jump  from 
one  mode  to  another. 

TW  response  as  Hybrid  Automaton,  M0: 

In  above,  we  enumerated  all  possible  TW  responses  and  formalized  them  in  terms  of  ay  and  a2, 
the  steady-state  voltage  difference  between  AVB  and  AVA  intemeurons  during  non-stimulus  and 
stimulation  period.  Hence,  to  encode  a  TW  response  0  as  hybrid  automaton  M^,  we  augment  MTw 
by  adding  two  additional  state  variables  oy  and  ay.  As  these  two  variables  measure  the  steady-state 
voltage  differences,  which  are  constant  in  time,  the  vector-fields  for  them  are  set  to  zero.  However, 
as  shown  in  Fig.|5.10[  both  ay  and  oy  are  reset  to  vavb  ~  vava  during  the  jump  from  “Mode  1”  to 
“Mode  2”  and  “Mode  2”  to  “Mode  3”,  respectively.  This  ensures  the  correct  values  of  oy  and  oy  in 
“Mode  3”. 


Mode  1  Mode  2  Mode  3 


Goal:  @3  A  <p 

Figure  5.10:  The  3-mode  hybrid  automaton  for  response  0  of  the  TW  circuit 


Parameter  Uncertainty 

C.  elegans  nervous  system  has  been  used  for  the  study  of  fundamental  problems  in  the  function 
of  neurons  and  neuronal  circuits  for  many  years.  Due  to  its  small  size,  the  technique  to  record 
electrophysiological  data,  however,  was  simply  not  developed  during  the  time  when  Wicks  et  al. 
derived  the  mathematical  model  for  the  TW  circuit. 

For  this  model,  Wicks  et  al.  extrapolated  the  electrophysiological  data  from  Ascaris,  a  larger 
nematode  related  to  C.  elegans.  Based  on  H1818.  they  first  assumed  a  standard  membrane  proper¬ 
ties  of  each  neuron  in  the  TW  circuit,  such  as,  membrane  capacitance,  resistance  etc.,  for  a  unit 
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area.  They  then  estimated  the  area  based  on  the  process  lengths  and  cell  diameter,  measured  using 
electron  micrographs  and  branching  morphology  [220,  221 .  [2261 . 

However,  the  diameter  of  the  cell  varies  from  0.2  to  1.0  /im  and  the  soma  diameters  of  the 
process  lengths,  assuming  the  worm  length  of  1  mm,  vary  between  2  to  10  pm  H226M.  As  a  result, 
the  surface  area  greatly  varies,  which,  in  turns,  causes  variability  in  membrane  properties.  Like  the 
membrane  properties,  the  gap-junction  and  synaptic  conductance  varied  among  the  population  of 
the  worms  [|170l.  Both  variability  causes  parameter  uncertainty  in  the  mathematical  model  of  the 
TW  circuit. 

Due  to  this  parameter  uncertainty,  we  consider  each  parameter  p  of  the  Wicks  et  al.  model  as  a 
random  variable.  As  a  result,  M&  becomes  a  stochastic  hybrid  automaton.  Now  for  each  response 
0,  we  formulate  a  probabilistic  reachability  problem  as  to  estimate  the  probability  such  that  “Mode 
3”  A  0  is  reachable  in  M0.  We  solve  this  problem  using  SReach. 

Model  Analysis 

As  we  discussed  above,  the  values  of  the  parameters  are  determined  based  on  the  size  of  surface 
area  of  neurons,  gap-junctions  and  synapses.  But  these  surface  areas  vary  among  population. 
According  to  H170H.  the  area  of  gap-junction  vary  between  1  to  10  ncm 2  and  standard  gap-junction 
conductance  per  unit  area  (1  cm2)  is  1  S  (Siemens)  |[28j|.  We  use  this  information  as  the  basis  to 
consider  1  to  10  nS  as  the  biologically  relevant  range  for  gap-junction  conductance  and  gfap  is 
chosen  by  dividing  capacitance  of  the  corresponding  neuron.  However,  as  we  could  not  able  to 
find  any  biologically  relevant  ranges  for  synaptic  and  leakage  conductance  from  the  literature,  we 
considered  those  parameters  as  constants  according  to  the  Table  5.3  from  H223ll. 

We  performed  our  analysis  on  the  control  and  five  ablation  groups.  In  each  analysis,  we  con¬ 
sider  Bayesian  sequential  estimation  with  0.05  as  the  estimation  error  bound,  0.95  as  the  coverage 
probability,  and  a  uniform  prior  (a  —  /3  —  1).  For  initial  condition,  we  first  simulated  the  Wicks 
et  al.  model  without  applying  any  stimulation  and  then  considered  the  steady  state  values  from 
simulation  as  the  initial  conditions.  We  set  the  initial  value  of  cr1  and  a2  to  zero,  stimulus  current 
as  100  pA/pF  and  e  to  10-4  (0.1  mV).  For  computation,  we  used  parallel  version  of  SReach  on  a 
32-core  machine. 

Table  5.3  shows  the  estimated  probability  of  each  TW  response  for  all  six  groups,  where  we 
considered  gfJ/p  as  a  uniform  random  variable  in  the  given  range  described  above.  In  contrast,  Table 
5.4  shows  the  results  where  we  considered  them  as  a  normal  random  variables.  For  the  random 
distribution,  we  considered  the  values  of  gpap  chosen  by  Wicks  et  al.  in  [|223l  as  mean.  But  we 
chose  the  variance  in  such  a  way  so  that  the  normal  distribution  cover  99%  of  the  range  of  gpap.  In 
all  cases,  the  predominant  response  in  each  group  are  highlighted  in  bold  on  both  tables. 


The  predominant  responses  that  we  determined  from  our  analysis  for  the  four  groups  conform 
with  the  predominant  responses  that  Wicks  et  al.  obtained  based  on  their  ablation  experiments  on 
actual  worm  in  [|222[|.  Note  that  Wicks  et  al.  did  not  differentiate  the  acceleration  and  deceleration 
responses  in  both  forward  and  backward  directions.  As  a  result,  their  distributions  on  the  TW 
responses  have  only  three  responses,  as  opposed  to  the  seven  responses  in  our  distributions.  In 
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Group 

REV 

NR 

F-ACC 

F-DEC 

FWD 

B-ACC 

B-DEC 

Pr 

RT(s) 

Pr 

RT(s) 

Pr 

RT(s) 

Pr 

RT(s) 

Pr 

RT(s) 

Pr 

RT(s) 

Pr 

RT(s) 

Control 

0.95 

801.78 

0.030 

87.45 

0.015 

16.48 

0.038 

282.67 

0.015 

16.48 

0.015 

16.48 

0.015 

16.48 

PLM 

0.95 

639.06 

0.015 

10.49 

0.015 

10.49 

0.04 

164.72 

0.015 

10.48 

0.015 

10.48 

0.015 

10.48 

ALM-AVM 

0.015 

8.57 

0.015 

8.57 

0.861 

973.60 

0.158 

728.97 

0.015 

8.57 

0.015 

8.57 

0.015 

8.57 

ALM-DVA 

0.433 

1399.37 

0.062 

286.47 

0.015 

15.325 

0.585 

1518.54 

0.015 

16.48 

0.015 

16.48 

0.015 

16.48 

AVM-PVC 

0.015 

3.33 

0.015 

3.33 

0.015 

3.33 

0.015 

3.33 

0.015 

3.33 

0.984 

255.21 

0.015 

3.33 

AVM-PLM 

0.015 

19.27 

0.015 

19.27 

0.015 

19.27 

0.984 

458.66 

0.015 

19.27 

0.015 

19.27 

0.015 

19.27 

Table  5.3:  Estimated  probability  and  runtime  for  all  response  patterns  by  considering  all  gfap  as 
normal  random  variables 


Group 

REV 

NR 

F-ACC 

F-DEC 

FWD 

B-ACC 

B-DEC 

Pr 

RT  (s) 

Pr 

RT(s) 

Pr 

RT(s) 

Pr 

RT(s) 

Pr 

RT(s) 

Pr 

RT(s) 

Pr 

RT(s) 

Control 

0.83 

2343.87 

0.039 

252.58 

0.015 

26.37 

0.121 

897.47 

0.015 

26.37 

0.015 

26.37 

0.015 

26.37 

PLM 

0.83 

1309.73 

0.015 

11.33 

0.015 

11.33 

0.127 

862.53 

0.015 

11.33 

0.015 

11.33 

0.015 

11.33 

ALM-AVM 

0.015 

9.57 

0.015 

9.57 

0.689 

1578.83 

0.33 

1442.56 

0.015 

9.57 

0.015 

9.57 

0.015 

9.57 

ALM-DVA 

0.414 

1406.31 

0.0303 

41.04 

0.015 

15.70 

0.547 

1766.48 

0.015 

15.70 

0.015 

15.70 

0.015 

15.70 

AVM-PVC 

0.015 

3.33 

0.015 

3.33 

0.015 

3.33 

0.015 

3.33 

0.015 

3.33 

0.984 

255.21 

0.015 

3.33 

AVM-PLM 

0.03 

72.02 

0.015 

16.49 

0.015 

16.49 

0.97 

419.39 

0.015 

16.49 

0.015 

16.49 

0.015 

16.49 

Table  5.4:  Estimated  probability  and  runtime  for  all  response  patterns  by  considering  all  g9iap  as 
uniform  random  variables 
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addition  to  these  four  groups,  we  performed  analysis  on  two  new  ablation  groups:  AVM,PLM-  and 
AVM,PVC-. 

By  comparing  Table  5.3  with  Table  5.4,  we  notice  that  the  estimated  probability  of  predominant 
response,  computed  by  considering  the  parameters  as  normal  random  variables,  is  closer  to  the 
value  obtained  by  Wicks  et  al.  This  indicates  that  the  parameters  are  more  likely  to  follow  normal 
distribution  over  uniform  distribution. 


5.4.4  Additional  Benchmarks 


Benchmark 

#Ms 

K 

#ODEs 

#Vs 

#RVs 

3 

EstJ5 

#SJS 

#T S 

A T(s) 

T T(s) 

BBK1 

1 

1 

2 

14 

3 

0.001 

0.754 

5372 

7126 

0.086 

612.836 

BBK5 

1 

5 

2 

38 

3 

0.001 

0.059 

209 

3628 

0.253 

917.884 

BBwDvl 

2 

2 

4 

20 

4 

0.001 

0.208 

2206 

10919 

0.080 

873.522 

BBwDv2K2 

2 

2 

4 

20 

3 

0.001 

0.845 

7330 

8669 

0.209 

1811.821 

BBwDv2K8 

2 

8 

4 

56 

3 

0.001 

0.207 

2259 

10901 

0.858 

9353.058 

Tld 

2 

7 

2 

33 

4 

0.001 

0.996 

227 

227 

0.213 

48.351 

Ted 

2 

7 

4 

50 

4 

0.001 

0.996 

227 

227 

12.839 

2914.448 

DTldK3 

2 

3 

4 

26 

2 

0.001 

0.996 

227 

227 

0.382 

86.714 

DTldK5 

2 

5 

4 

38 

2 

0.001 

0.161 

1442 

8961 

0.280 

2509.078 

W4mvl 

4 

3 

8 

26 

6 

0.001 

0.381 

5953 

15639 

0.238 

3722.082 

W4mv2K3 

4 

3 

8 

26 

6 

0.001 

0.996 

227 

227 

0.673 

152.771 

W4mv2K7 

4 

7 

8 

50 

6 

0.001 

0.004 

0 

227 

0.120 

27.240 

DWK1 

2 

1 

4 

14 

5 

0.001 

0.996 

227 

227 

0.171 

38.817 

DWK3 

2 

3 

4 

26 

5 

0.001 

0.996 

227 

227 

0.215 

48.806 

DWK9 

2 

9 

4 

62 

5 

0.001 

0.996 

227 

227 

5.144 

1167.688 

Que 

3 

2 

3 

13 

4 

0.001 

0.228 

2662 

11677 

0.095 

1109.315 

3dOsc 

3 

2 

18 

48 

2 

0.001 

0.996 

227 

227 

8.273 

1877.969 

QuadC 

1 

0 

14 

44 

6 

0.001 

0.996 

227 

227 

825.641 

187420.507 

exPHAOl 

2 

2 

4 

20 

2 

0.001 

0.524 

345 

658 

5.01 

3295.82 

exPHA02 

2 

3 

2 

17 

1 

0.001 

0.900 

5361 

5953 

0.0004 

2.35 

KRk5 

6 

5 

84 

194 

2 

0.001 

0.544 

8946 

16457 

0.122 

2015.64 

KRk6 

8 

6 

112 

224 

6 

0.001 

0.246 

2032 

8263 

1.385 

1 1444.22 

KRk7 

10 

7 

150 

271 

6 

0.001 

0.096 

558 

5795 

16.275 

94311.18 

KRkB 

7 

8 

105 

303 

6 

0.001 

0.004 

0 

227 

0.003 

0.58 

KRk9 

9 

9 

135 

335 

6 

0.001 

0.004 

0 

227 

0.015 

3.43 

KRklO 

11 

10 

165 

367 

6 

0.001 

0.004 

0 

227 

0.026 

5.92 

Table  5.5:  #Ms  =  number  of  modes,  K  indicates  the  unfolding  steps,  #ODEs  =  number  of  ODEs  in  the  unfolded  formulae,  #Vs  =  number  of  total 
variables  in  the  unfolded  formulae,  #RVs  =  number  of  random  variables  in  the  model,  S  =  precision  used  in  dReach. 


To  further  demonstrate  SReach’s  applicability,  we  also  applied  it  to  additional  benchmarks 
including  HAps,  PHAs,  and  PHArs  with  subtle  non-determinism.  Table  15. 5 1  shows  the  results  of 
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these  experiments.  These  experiments  were  conducted  with  the  sequential  version  of  SReach  on 
a  machine  with  2.9GHz  Intel  Core  i7  processor  and  8GB  RAM,  running  OS  X  10.9.2.  In  our 
experiments  we  used  0.001  as  the  precision  for  the  5-decision  problem;  and  Bayesian  sequential 
estimation  with  0.01  half-interval  width,  coverage  probability  0.99,  and  uniform  prior  (a  =  /3  = 
1).  In  the  following  table,  “BB”  refers  to  the  bouncing  ball  models,  “Tld”  the  thermostat  model 
with  linear  temperature  decrease,  “Ted”  the  thermostat  model  with  exponential  decrease,  “DT” 
the  dual  thermostat  models,  “W”  the  watertank  models,  “DW”  the  dual  watertank  models,  “Que” 
the  model  for  queuing  system  which  has  both  nonlinear  functions  and  nonde  termini  Stic  jumps, 
“3dOsc”  the  model  for  3d  oscillator,  and  “QuadC”  the  model  for  quadcopter  stabilization  control. 
Following  these  hybrid  systems  with  parametric  uncertainty,  we  also  consider  two  example  PHAs 
-  “exPHAOl”  and  “exPHA02”,  and  PHA,.s  with  trivial  non-determinism  -  “KR”  (the  stochastic 
version  of  our  kill  erred  models). 

Model  description 

Synthesized  Killerred  Model.  The  ODEs  missing  in  Figure [57TT| arc  as  follows. 


d[mRNA\ 
d  t 

d[KRjm] 

dt. 

d[A  Rmds] 
dt 

d  [KRmds\ 
dt 


d  [KRmdS*] 
dt 

d  [KRmdT*] 
dt 

d  [SOX] 
dt 

d  [SOXsod] 
dt 


k RNAsyn  '  [79  A  A]  ^R/VAdeg  '  [r?T-RiVA] 

k KRimsyn  '  [iTlRNA]  —  (kRRm  +  kxRirndeg )  '  [K Rim] 

kKRm  ■  [KRim]  -  kKRmdSdeg  •  [K Rmds\  ( before  turning  on  the  light ) 

kxRm  '  [K  Rim]  +  kKRf  ■  [KRmds*]  +  kKRic  ■  [KRmdS*]  +  kKRnrd 

■[KRmdT*]  +  kKRsoxd  1  •  [KRmdT*]  ~  kKRex  ■  [K RmdS\  -  kK RmdSdeg 
■  [K Rnuis\  (after  adding  light) 

kKRex  ■  [KR-rnds]  —  kRRf  •  [K Rm.dS*]  —  kxRic  •  [KRmds*] 

—  kKRisc  ■  [KRmdS*]  —  kKRmds*deg  '  [KRmdS*] 
kKRiac  ■  [KRmdS*]  -  kKRnrd  ■  [KRmdT*]  ~  kKRSoxd l  '  [KRmdT*] 
kKRSOxd2  '  [KRmdT*]  ~  k K RrndT* deg  '  [KRmdT*\ 


kKRSoxdi  '  [KRmdT*]  +  kKRsoxd2  '  [K RmdT*] 


ksOD  ■  VmaxSOD  ' 


[SOX] 

Km  +  [SOX] 


d  [SOXsod] 
dt 


Atrial  Fibrillation.  The  model  has  four  discrete  control  locations,  four  state  variables,  and  non¬ 
linear  ODEs.  A  typical  set  of  ODEs  in  the  model  is  as  follows.  The  exponential  term  on  the 
right-hand  side  of  the  ODE  is  the  sigmoid  function,  which  often  appears  in  modeling  biological 
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Figure  5.11:  A  probabilistic  hybrid  automaton  for  synthesized  phage-based  therapy  model 
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switches. 


du 

dt 

ds 

dt 

dv 

dt 


e  +  (u-  0v)(uu  -  u)vgfi  +  wsgsi  -  gso(u ) 

_ 9s2 _ 

(l  +  exp(— 2  k(u  —  us)))  9 s2'S 

,  dw  , 

-dv  -v  —]7  =  ~9w  ■  w 


Prostate  Cancer  Treatment.  The  nonlinear  ODEs  in  the  Prostate-Cancer-Treatment  model  are  as 
follows. 


dx 

dt 

dy_ 

dt 

dz 

dt 

dv 

dt 


(oix(ki  +  (1  -  fa) — — - Ar((l  -  fa) — +  fa))  -  mi(l 

Z  +  rC  2  2  +  /C4 

mi(l - )x  +  (ay(  1  -  d—)  -  f3y)y  +  c2y 

zo 

— z 

- b  C3Z 

T 

{ax(fa  +  (1  -  fa) — - px(fa  +  (1  -  fa) — f-p))  -  mi(l 

Z  +  rC  2  2  H-  /C4 

2;  2^ 

+mi(l - )x  +  (cq/l  -  d—)  -  /3y)y  +  c2y 

Zq  Zq 


Z  \  \ 

—  ))x  +  ci  a; 


—  ))x  +  ci  a; 
z  0 


Electronic  Oscillator.  The  3dOsc  model  represents  an  electronic  oscillator  model  that  contains 
nonlinear  ODEs  such  as  the  following. 


dx 

dt 

dy_ 

dt 

dz 

dt 

UJl 

dt 


— ax  ■  sin(uii  •  r) 

—ay  ■  sin((u>i  +  ci)  •  r)  •  sin{uj2)  ■  2 

—az  ■  sin((ui2  +  c2)  ■  r)  ■  cos(cvi)  ■  2 

co2  dr 

-c3  ■  U\  —  =  c4  •  UJ2  —  =  1 
dt  dt 


Quadcopter  Control.  We  developed  a  model  that  contains  the  full  dynamics  of  a  quadcopter.  We 
use  the  model  to  solve  control  problems  by  answering  reachability  questions.  A  typical  set  of  the 
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differential  equations  are  the  following. 


dt 

dUJy 

dt 

dujz 

dt 
d  0 

dt 


dO 

dt 


dtp 

dt 


dxp 

dt 

dyp 

dt 

dzp 

dt 

dx 

dt 


L  •  k  •  (CU3  ^3)  iX/Ixx)  ( kyy  k zz)UJyUJ z / 1 xx 

L  •  k  •  (^2  (i  / 1 yy}  (ylzz  k xx)(jJ  xJjJ  z  / 1 yy 

b  •  (CU^  U>2  T  ^3  fizz)  ( kXX  kyy^jUJXUJy  /  IZZ 

sin  (0)  sin  ( 6 ) 

0JX  ~7~, - ; - r - idy 

(Sln(cosWS(e)  +  COS  (^)  COS  (0))  COS  (0) 
sin  ( 9 ) 

+*^m  +  cos{4>)a>s{g)u,‘ 
sin  (0)2  cos  ( 9 ) 

+  COS  W  C0S  OT)  C0S  (^)2 

+  1  )Wy  _  __ — 3i“WC0S<g)  _ - W; 

COS  ^  (Sm(cos(0)S(e)  +  COS  (^)  COS  (0)J  COS  (^) 

sin  (0) 

(sm(tl(7){e)  + cos  (^) cos  (0)) cos  (^) 

1 

H - 2 - cuz 

^=»^fffi  +  coSWcos(9) 

(l/m)(sin(0)  sin (ip)k(uif  +  +  cu2  +  ce2)  —  k  ■  d  ■  xp) 


(l/m)(—  cos(0)  sin(d)/c(o;2  +  +  tu^)  —  k  ■  d  ■  yp) 


(1  /m)(—g  —  cos(9)k(ul  +  +  cu2  +  cu2)  —  k  ■  d  ■  zp 


dy 


d^ 


xp,  =  yp,  =  zp 


The  full  descriptions  of  all  the  models  that  mentioned  in  this  paper  can  be  found  on  the  tool 
website. 
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Chapter  6 

Pancreatic  Cancer  Microenvironment 
Model  as  A  Multiscale  Hybrid  Rule-based 
Model  and  Statistical  Model  Checking 


As  mentioned  in  chapter  2,  pancreatic  cancer  (PC),  as  an  extremely  aggressive  disease,  is  the 
fourth  leading  cause  of  cancer  death  in  United  States  fl5],  and  the  seventh  cause  globally  [0.  It 
is  anticipated  to  become  the  second  by  2020.  For  years,  extensive  efforts  have  been  placed  on 
understanding  the  functionality  of  pancreatic  cancer  cells  (PCCs),  and  on  developing  effective 
therapies  solely  targeting  at  them.  However,  the  poor  prognosis  for  PC  remains  largely  unchanged. 
To  turn  this  tide,  the  research  focus  of  pancreatic  cancer  has  been  shifted  from  solely  looking 
into  pancreatic  cancer  cells  towards  investigating  the  microenvironment  of  the  pancreatic  cancer. 
Biologists  have  recently  noticed  that  one  contributing  factor  to  the  failure  of  systemic  therapies  may 
be  the  abundant  tumor  micro-environment.  As  a  characteristic  feature  of  PC,  the  microenvironment 
includes  pancreatic  stellate  cells  (PSCs),  endothelial  cells,  nerve  cells,  immune  cells,  lymphocytes, 
dendritic  cells,  the  extracellular  matrix,  and  other  molecules  surrounding  PCCs  111448.  Over  the 
past  decade,  evidence  has  been  accumulated  to  demonstrate  the  potentially  critical  functions  of 
these  cells  in  regulating  the  growth,  invasion,  and  metastasis  of  PC  lf8TT  [85l  [86l  I144H .  Among  these 
cells,  PSCs  and  cancer- associated  macrophages  play  primary  roles  during  the  development  of  PC 
I1144B.  Studies  have  confirmed  that  PSCs  are  the  primary  cells  producing  the  stromal  reaction  [fl6l 
[201.  In  a  healthy  pancreas,  PSCs  exist  quiescently  in  the  periacinar,  perivascular,  and  periductal 
space.  While,  in  the  diseased  state,  PSCs  will  be  activated  by  growth  factors,  cytokines,  and 
oxidant  stress  secreted  or  induced  by  PCCs.  Activated  PSCs  will  then  transform  from  the  quiescent 
state  to  the  myofibroblast  phenotype.  This  results  in  their  losinlipid  droplets,  actively  proliferating, 
migrating,  producing  large  amounts  of  extracellular  matrix,  and  expressing  cytokines,  chemokines, 
and  cell  adhesion  molecules.  In  return,  the  activated  PSCs  promote  the  growth  of  PCCs. 

In  this  chapter,  to  quantitatively  understand  the  microenvironment  of  PC,  we  construct  a  mul¬ 
ticellular  model.  This  model  consists  of  intracellular  signaling  networks  of  pancreatic  cancer  cells 
and  stellate  cells  respectively,  and  intercellular  interactions  among  them  as  well.  To  formally 
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describe  our  multicellular  and  multiscale  model  and  perform  formal  analysis,  we  extend  the  rule- 
based  language  BioNetGen  lf83l  to  enable  the  formal  specification  of  not  only  the  signaling  network 
within  a  single  cell,  but  also  interactions  among  multiple  cells.  Specifically,  we  represent  the  in¬ 
tercellular  level  dynamics  using  rules  with  continuous  variables  and  use  Boolean  Networks  (BNs) 
to  capture  the  dynamics  of  intracellular  signaling  networks,  considering  the  fact  that  a  large  num¬ 
ber  of  reaction  rate  constants  are  not  available  in  the  literature  and  difficult  to  be  experimentally 
determined.  Our  extension  saves  the  virtues  of  both  BNs  and  rule-based  kinetic  modeling,  while 
advancing  the  specification  power  to  multicellular  and  multiscale  models.  We  employ  stochas¬ 
tic  simulation  NFsim  H197I  and  statistical  model  checking  (StatMC)  111321  to  analyze  the  systems 
properties.  The  formal  analysis  results  show  that  our  model  reproduces  existing  experimental 
findings  with  regard  to  the  mutual  promotion  between  pancreatic  cancer  and  stellate  cells.  The 
model  also  provides  insights  into  how  treatments  latching  onto  different  targets  could  lead  to  dis¬ 
tinct  outcomes.  Using  the  validated  model,  we  predict  novel  (poly pharmacological  strategies  for 
improving  PC  treatment. 


6.1  Signalling  Networks  within  Pancreatic  Cancer  Microenvi¬ 
ronment 

We  construct  a  multicellular  model  for  pancreatic  cancer  microenvironment  based  on  a  comprehen¬ 
sive  literature  search.  The  reaction  network  of  the  model  is  summarized  in  Figure  |6.1|  It  consists 
of  three  parts  that  are  colored  with  green,  blue,  and  purple  respectively:  (i)  the  intracellular  sig¬ 
naling  network  of  PCCs,  (ii)  the  intracellular  signaling  network  of  PSCs,  and  (iii)  the  signaling 
molecules  (such  as  growth  factors  and  cytokines)  in  the  extracellular  space  of  the  microenviron¬ 
ment,  which  are  ligands  of  the  receptors  expressed  in  PCCs  and  PSCs.  Note  that  — >■  denotes 
activation/promotion/up-regulation,  and  -•  represents  inhibition/suppression/down-regulation. 

Intracellular  signaling  network  of  PCCs 

Pathways  regulating  proliferation 

KRas  mutation  enhances  proliferation  ll23ll.  Mutations  of  the  KRas  oncogene  occur  in  the 
precancerous  stages  with  a  mutational  frequency  over  90%.  It  can  lead  to  the  continuous  activation 
of  the  RAS  protein,  which  then  constantly  triggers  the  RAF— >MEK  cascade,  and  promotes  PCCs’ 
proliferation  through  the  activation  of  ERK  and  JNK. 

EGF  activates  and  enhances  proliferation  HI 671 .  Epidermal  growth  factor  (EGF)  and  its  cor¬ 
responding  receptor  (EGFR)  are  expressed  in  ~95%  of  PCs.  EGF  promotes  proliferation  through 
the  RAS— )-RAF— >MEK— >JNK  cascade.  It  can  also  trigger  the  RAS— ^RAF— )-MEK— * 

ERK— t-cJUN  cascade  to  secrete  EGF  molecules,  which  can  then  quickly  bind  to  overexpressed 
EGFR  again  to  promote  the  proliferation  of  PCCs,  which  is  believed  to  confer  the  devastating 
nature  on  PCs. 

HER2/neu  mutation  also  intensifies  proliferation  lf23ll.  HER2/neu  is  another  oncogene  fre- 
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Apoptosis  Proliferation  Migration  ◄ - Activation 


Figure  6.1:  The  pancreatic  cancer  microenvironment  model 
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quently  mutated  in  the  initial  PC  formation.  Mutant  HER2  can  bind  to  EGFR  to  form  a  het¬ 
erodimer,  which  can  activate  the  downstream  signaling  pathways  of  EGFR. 

bFGF  promotes  proliferation  Il30l.  As  a  mitogenic  polypeptide,  bFGF  can  promote  pro¬ 
liferation  through  both  RAF— s-MEK— s-ERK  and  RAF— s-MEK— GNK  cascades.  In  addition,  bFGF 
molecules  are  released  through  RAF— s-MEK— )-ERK  pathway  to  trigger  another  autocrine  signaling 
pathway  in  the  PC  development. 

Pathways  regulating  apoptosis 

Apoptosis  is  the  most  common  mode  of  programmed  cell  death.  It  is  executed  by  caspase 
proteases  that  are  activated  by  death  receptors  or  mitochondrial  pathways. 

TGF/31  initiates  apoptosis  H 1941 .  In  PCCs,  transforming  growth  factor  6  1  (TGF/3  1 )  binds  to 
and  activates  its  receptor  (TGFR),  which  in  turn  activates  receptor-regulated  SAMDs  that  hetero- 
oligomerize  with  the  common  SAMD3  and  SAMD4.  After  translocating  to  the  nucleus,  the  com¬ 
plex  initiates  apoptosis  in  the  early  stage  of  the  PC  development. 

Mutated  oncogenes  inhibit  apoptosis.  Mutated  KRas  and  HER2/neu  can  inhibit  apoptosis 
by  downregulating  caspases  (CASP)  through  PI3K— s-AKT— s-NFkB  cascade  and  by  inhibiting  Bax 
(and  indirectly  CASP)  via  the  PI3K— S-PIP3— >-AKT— »•  ■  ■  — >BCF-XF  pathway. 

Pathways  regulating  autophagy 

Autophagy  is  a  catabolic  process  involving  the  degradation  of  a  cell’s  own  components  through 
the  lysosomal  machinery.  This  pro-survival  process  enables  a  starving  cell  to  reallocate  nutrients 
from  unnecessary  processes  to  essential  processes.  Recent  studies  indicate  that  autophagy  is  im¬ 
portant  in  the  regulation  of  cancer  development  and  progression  and  also  affects  the  response  of 
cancer  cells  to  anticancer  therapy  [119-1461. 

mTOR  regulates  autophagy  [11661 .  The  mammalian  target  of  rapamycin  (mTOR)  is  a  criti¬ 
cal  regulator  of  autophagy.  In  PCCs,  the  upstream  pathway  PI3K— S-PIP3— >-AKT  activates  mTOR 
and  inhibits  autophagy.  The  MEK— >-ERK  cascade  downregulates  mTOR  via  cJUN  and  enhances 
autophagy. 

Overexpression  of  anti-apoptotic  factors  promotes  autophagy  Ill57l.  Apoptosis  and  au¬ 
tophagy  can  mutually  inhibit  each  other  due  to  their  crosstalks.  In  the  initial  stage  of  PC,  the 
upregulation  of  apoptosis  leads  to  the  inhibition  of  autophagy.  Along  with  the  progression  of  can¬ 
cer,  when  apoptosis  is  suppressed  by  the  highly  expressed  anti-apoptotic  factors  (e.g.  NF/vB  and 
Beclinl),  autophagy  gradually  takes  the  dominant  role  and  promotes  PCC  survival. 

Intracellular  signaling  network  of  PSCs 

Pathways  regulating  activation 

PCCs  can  activate  the  surrounding  inactive  PSCs  by  cancer-cell-induced  release  of  mitogenic 
and  fibrogenic  factors,  such  as  PDGFBB  and  TGF/i  1 .  As  a  major  growth  factor  regulating  cell 
functions  of  PSCs,  PDGFBB  activates  PSCs  II 1071  through  the  downstream  ERK— )-APl  signal¬ 
ing  pathway.  The  activation  of  PSCs  is  also  mediated  by  TGF/31  [11071  via  TGFR— kSAMD 
pathway.  The  autocrine  signaling  of  TGF/31  maintains  the  sustained  activation  of  PSCs.  Further- 
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more,  the  cytokine  TNFa,  which  is  a  major  secretion  of  tumor-associated  macrophages  (TAMs) 
in  the  microenvironment,  is  also  involved  in  activating  PSCs  lll59ll  through  binding  to  TNFR, 
which  indirectly  activates  NFkB. 

Pathways  regulating  migration 

Migration  is  another  characteristic  cell  function  of  PSCs.  Activated  PSCs  move  towards  PCCs, 
and  form  a  cocoon  around  tumor  cells,  which  could  protect  the  tumor  from  therapies’  attacks 

mm. 

Growth  factors  promote  migration.  Growth  factors  existing  in  the  microenvironment,  in¬ 
cluding  EGF,  bFGF,  and  VEGF,  can  bind  to  their  receptors  on  PSCs  and  activate  the  migration 
through  the  MAPK  pathway. 

PDGFBB  contributes  to  the  migration  1117311.  PDGFBB  regulates  the  migration  of  PSCs 
mainly  through  two  downstream  pathways:  (i)  the  PI3K— >-PIP3— )-AKT  pathway,  which  mediates 
PDGF-induced  PSCs’  migration,  but  not  proliferation,  and  (ii)  the  ERK— ^APl  pathway  that  regu¬ 
lates  activation,  migration,  and  proliferation  of  PSCs. 

Pathways  regulating  proliferation 

Growth  factors  activate  proliferation.  In  PSCs,  as  key  downstream  components  for  several 
signaling  pathways  initiated  by  distinct  growth  factors,  such  as  EGF  and  bFGF,  the  ERK— s-APl 
cascade  activates  the  proliferation  of  PSCs.  Compared  to  inactive  PSCs,  active  ones  proliferate 
more  rapidly. 

Tumor  suppressers  repress  proliferation.  Similar  to  PCCs,  P53,  P21,  and  PTEN  act  as 
suppressers  for  PSCs’  proliferation. 

Pathways  regulating  apoptosis 

P53  upregulates  modulator  of  apoptosis  II130II.  The  apoptosis  of  PSCs  can  be  initiated  by 
P53,  whose  expression  is  regulated  by  the  MAPK  pathway. 

Interactions  between  PCCs  and  PSCs 

The  mechanism  underlying  the  interplay  between  PCCs  and  PSCs  is  complex.  In  a  healthy 
pancreas,  PSCs  exist  quiescently  in  the  periacinar,  perivascular,  and  periductal  space.  However,  in 
the  diseased  state,  PSCs  will  be  activated  by  growth  factors,  cytokines,  and  oxidant  stress  secreted 
or  induced  by  PCCs,  including  EGF,  bFGF,  VEGF,  TGF/31,  PDGF,  sonic  hedgehog,  galectin  3, 
endothelin  1  and  serine  protease  inhibitor  nexin  2  ifTTIl .  Activated  PSCs  will  then  transform  from 
the  quiescent  state  to  the  myofibroblast  phenotype.  This  results  in  their  losinlipid  droplets,  ac¬ 
tively  proliferating,  migrating,  producing  large  amounts  of  extracellular  matrix,  and  expressing 
cytokines,  chemokines,  and  cell  adhesion  molecules.  In  return,  the  activated  PSCs  promote  the 
growth  of  PCCs  by  secreting  various  factors,  including  stromal-derived  factor  1,  FGF,  secreted 
protein  acidic  and  rich  in  cysteine,  matrix  metalloproteinases,  small  leucine -rich  proteoglycans, 
periostin  and  collagen  type  I  that  mediate  effects  on  tumor  growth,  invasion,  metastasis  and  resis¬ 
tance  to  chemotherapy  11771.  Among  them,  EGF,  bFGF,  VEGF,  TGF/M,  and  PDGFBB  are  essential 
mediators  of  the  interplay  between  PCCs  and  PSCs  that  have  been  considered  in  our  model. 
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Autocrine  and  paracrine  involving  EGF/bFGF  H1561.  EGF  and  bFGF  can  be  secreted  by 
both  PCCs  and  PSCs.  In  turn,  they  will  bind  to  EGFR  and  FGFR  respectively  on  both  PCCs  and 
PSCs  to  activate  their  proliferation  and  further  secretion  of  EGF  and  FGF. 

Interplay  through  VEGF  [j214fl.  As  a  proangiogenic  factor,  VEGF  is  found  to  be  of  great 
importance  in  the  activation  of  PSCs  and  angiogenesis  during  the  progression  of  PCs.  VEGF, 
secreted  by  PCCs,  can  bind  with  VEGFR  on  PSCs  to  activate  the  PI3K  pathway.  It  further  promotes 
the  migration  of  PSCs  through  PIP3— >AKT,  and  suppresses  the  transcription  activity  of  P53  via 
MDM2. 

Autocrine  and  paracrine  involving  TGFul  H156II.  PSCs  by  themselves  are  capable  of  syn¬ 
thesizing  TGF/j  1 ,  suggesting  the  existence  of  an  autocrine  loop  that  may  contribute  to  the  perpet¬ 
uation  of  PSC  activation  after  an  initial  exogenous  signal,  thereby  promoting  the  development  of 
pancreatic  fibrosis. 

Interplay  through  PDGFBB  11771  PDGFBB  exists  in  the  secretion  of  PCCs,  whose  produc¬ 
tion  is  regulated  by  TGF3 1  signaling  pathway.  PDGFBB  can  activate  PSCs  and  initiate  migration 
and  proliferation  as  well. 


6.2  The  Modeling  Language 

Rule -based  modeling  languages  are  often  used  to  specify  protein-to-protein  reactions  within  cells 
and  to  capture  the  evolution  of  protein  concentrations.  BioNetGen  language  is  a  representative 
rule-based  modeling  formalism  [f83ll.  which  consists  of  three  components:  basic  building  blocks, 
patterns,  and  rules.  In  our  setting,  in  order  to  simultaneously  simulate  the  dynamics  of  multiple 
cells,  interactions  among  cells,  and  intracellular  reactions,  we  advance  the  specifying  power  of 
BioNetGen  by  redefining  basic  building  blocks  and  introducing  new  types  of  rules  for  cellular 
behaviors  as  follows. 

Basic  building  blocks.  In  BioNetGen,  basic  building  blocks  are  molecules  that  may  be  assem¬ 
bled  into  complexes  through  bonds  linking  components  of  different  molecules.  To  handle  multi¬ 
scale  dynamics  (i.e.  cellular  and  molecular  levels),  we  allow  the  fundamental  blocks  to  be  also 
cells  or  extracellular  molecules.  Specifically,  a  cell  is  treated  as  a  fundamental  block  with  subunits 
corresponding  to  the  components  of  its  intracellular  signaling  network.  Furthermore,  extracellular 
molecules  (e.g.  EGF)  are  treated  as  fundamental  blocks  without  subunits. 

As  we  use  BNs  to  model  intracellular  signaling  networks,  each  subunit  of  a  cell  takes  binary 
values  (it  is  straightforward  to  extend  BNs  to  discrete  models).  The  Boolean  values  -  “True  (T)” 
and  “False  (F)”  -  can  have  different  biological  meanings  for  distinct  types  of  components  within 
the  cell.  For  example,  for  a  subunit  representing  cellular  process  (e.g.  apoptosis),  “T”  means 
the  cellular  process  is  triggered,  and  “F”  means  it  is  not  triggered.  For  a  receptor,  “T”  means 
the  receptor  is  bound,  and  “F”  means  it  is  free.  For  a  protein,  “T”  indicates  this  protein  has  a 
high  concentration,  and  “F”  indicates  that  its  concentration  level  is  below  the  value  to  regulate 
downstream  targets. 
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Patterns.  As  defined  in  BioNetGen,  patterns  are  used  to  identify  a  set  of  species  that  share 
features.  For  instance,  the  pattern  C(c\)  matches  both  C(ci,c2  ~  T)  and  C(ci,c2  ~  F).  Using 
patterns  offers  a  rich  yet  concise  description  in  specifying  components. 

Rules.  In  BioNetGen,  three  types  of  rules  are  used  to  specified:  binding/unbinding,  phospho¬ 
rylation,  and  dephosphorylation.  Here  we  introduce  nine  rules  in  order  to  describe  the  cellular 
processes  in  our  model  and  the  potential  therapeutic  interventions.  For  each  type  of  rules,  we 
present  its  formal  syntax  followed  by  examples  that  demonstrate  how  it  is  used  in  our  model. 

Rule  1:  Ligand-receptor  binding 

<  Lig  >  +  <  Cell  >  (<  Rec  >~  F)  — K  Cell  >  (<  Rec  >~  T)  <  binding jrate  > 

Remark :  On  the  left-hand  side,  the  “F”  value  of  a  receptor  <  Rec  >  indicates  that  the  receptor 
is  free.  When  a  ligand  <  Lig  >  binds  to  it,  the  reduction  of  number  of  extracellular  ligand  is 
represented  by  its  elimination.  In  the  meanwhile,  “<  Rec  >~  T”,  on  the  right-hand  side,  indicates 
that  the  receptor  is  not  free  any  more.  Note  that,  the  multiple  receptors  on  the  surface  of  a  cell  can 
be  modeled  by  setting  a  relatively  high  rate  on  the  following  downstream  regulating  rules,  which 
indicates  the  rapid  “releasing”  of  bound  receptors.  An  example  in  our  microenvironment  model  is 
the  binding  between  EGF  and  EGFR  for  PCCs:  UEGF+PCC(EGFR  ~  F)  PCC(EGFR  ~ 
T)  1”. 

Rule  2:  Mutated  receptors  form  a  heterodimer 

<  Cell  >  (<  .Re ci  >~  F,  <  Rec2  >~  F)  — » 

<  Cell  >  (<  Rec\  >~  T,  <  Rec2  >~  T)  <  mutated Jbinding jrate  > 

Remark :  Unbound  receptors  can  bind  together  and  form  a  heterodimer.  For  example,  in  our  model, 
the  mutated  HER2  can  activate  downstream  pathways  of  EGFR  by  binding  with  it  and  forming  a 
heterodimer:  PCC{EGFR  ~  F,  HER2  ~  F)  ->•  PCC{EGFR  ~  T,  HER2  ~  T)  10”. 

Rule  3:  Downstream  signaling  transduction 

Rule  3.1  (Single  parent)  upregulation  (activation,  phosphorylation,  etc.) 

<  Cell  >  (<  Act  >~  T,  <  Tar  >~  F)  — > 

<  Cell  >  (<  Act  >~  T,  <  Tar  >~  T)  <  trate  > 

Rule  3.2  (Single  parent)  downregulation  (inhibition,  dephosphorylation,  etc.) 

<  Cell  >  (<  Inh  >~  T,  <  Tar  >~  T)  — y 

<  Cell  >  (<  Inh  >~  T,  <  Tar  >~  F)  <  trate  > 

Rule  3.3  (Multiple  parents)  Downstream  regulation 

<  Cell  >  (<  Inh  >~  F,  <  Act  >~  T,  <  Tar  >~  F)  — > 

<  Cell  >  (<  Inh  >~  F,  <  Act  >~  T,  <  Tar  >~  T)  <  trate  > 
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<  Cell  >  (<  Inh  >~  T,  <  Tar  >~  T)  — * 

<  Cell  >  (<  Inh  >~  T,  <  Tar  >~  F)  <  trate  > 

Remark :  Instead  of  using  kinetic  rules  (such  as  in  ML-Rules),  our  language  use  logical  rules  of 
BNs  to  describe  intracellular  signal  cascades.  Downsteam  signal  transduction  rules  are  used  to 
describe  the  logical  updating  functions  for  all  intracellular  molecules  constructing  the  signaling 
cascades.  For  instance,  Rule  3.3  presents  the  updating  function  <  Tar  >0+0  =  ->  <  Inh 
x (<  Act  ><J,+  <  Tar  >W),  where  “<  Inh  >”  is  the  inhibitor,  and  “<  Act  >”  is  the  activator. 
In  this  manner,  concise  rules  can  be  devised  to  handle  complex  cases,  where  there  exists  multiple 
regulatory  parents.  Note  that  our  model  follows  the  biological  assumption  that  inhibitors  hold 
higher  priorities  than  activators  with  respect  to  their  impacts  on  the  target.  “+”  and  “x”  in  logical 
functions  represent  logical  “OR”  and  “AND”  respectively.  An  example  in  our  model  is  that,  in 
PCCs,  ST  AT  can  be  activated  by  JAKT.  “PCC(  JAK 1  ~  T,  ST  AT  ~  T)  ->  PCC{JAK  1  ~ 
F,  ST  AT  ~  T)  0.012”  and  UPCC(JAK1  ~  T,  ST  AT  ~  F)  ->  PCC(JAK1  ~  F,  ST  AT  ~ 
T)  0.012”. 

Rule  4:  Cellular  processes 

Rule  4.1  Proliferation 

<  Cell  >  ( Pro  ~  T)  -> 

<  Cell  >  ( Pro  ~  F)+  <  Cell  >  ( Pro  ~  F,  ■  ■  ■ )  <  prorrate  > 

Remark :  When  a  cell  proliferates,  we  keep  the  current  values  of  subunits  for  the  cell  that  initiates 
the  proliferation,  and  assume  the  new  cell  to  have  the  default  values  of  subunits.  The  “•  •  •  ”  in  the 
rule  denotes  the  remaining  subunits  with  their  default  values. 

Rule  4.2  Apoptosis 


<  Cell  >  ( Apo  ~  T)  — *  NullQ  <  apopjrate  > 

Remark :  A  type  “Null()”  is  declared  to  represent  dead  cells  or  degraded  molecules.  In  our  model, 
the  apoptosis  of  PSCs  is  described  as  “P  SC  (Apo  ~  T)  — *  NullQ 
5e  -  4”. 

Rule  4.3  Autophagy 

<  Cell  >  (Aut  ~  T )  — >■<  Mol  >  +  •  •  •  <  automate  > 

Remark :  The  molecules  on  the  right-hand  side  of  this  type  of  rules  will  be  released  into  the  mi¬ 
croenvironment  due  to  autophagy.  They  are  the  existing  molecules  expressed  inside  this  cell  when 
autophagy  is  triggered. 

Rule  5:  Secretion 

<  Cell  >  (<  secMol  >~  T)  — > 

<  Cell  >  (<  secMol  >~  F)+  <  Mol  >  <  secrrate  > 

Remark :  When  the  secretion  of  “<  M ol  >”  has  been  triggered,  its  amount  in  the  microenvi¬ 
ronment  will  be  added  by  1.  Note  that,  we  can  differentiate  the  endogenous  and  exogenous 
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molecules  by  labeling  the  secreted  “<  Mol  >”  with  the  cell  name.  In  our  model,  we  have 

“ PCC(secEGF  ~  T)  PCC(secEGF  ~  F)  +  2.7e  -  4”. 

Rule  6:  Mutation 

<  Ce//  >  (<  Mol  >~<  unmutated  >)  — >■ 

<  Ce//  >  (<  Mol  >~<  mutated  >)  <  mrate  > 

Remark :  For  mutant  proteins  that  are  constitutively  active,  we  set  a  very  high  value  to  the  muta¬ 
tion  rate  “mrate”.  In  this  way,  we  can  almost  keep  the  value  of  the  mutated  molecule  as  what  it 
should  be.  For  example,  in  our  model,  the  mutation  of  oncoprotein  Ras  in  PCCs  is  captured  by 

“ PCC(RAS  ~  F)  -)■  PC C (RAS  ~  T)  10000”. 

Rule  7:  Constantly  over-expressed  extracellular  molecules 

Cancer  Evn  — >  Cancer Evn+  <  Mol  >  <  seccrate  > 

Remark :  We  use  this  type  of  rules  to  mimic  the  situation  that  the  concentration  of  an  over-expressed 
extracellular  molecule  stays  in  a  high  level  constantly. 

Rule  8:  Degradation  of  extracellular  molecules 

<  Mol  >— >  NullQ  <  deg  crate  > 

Remark :  Here,  “Null()”  is  used  to  represent  dead  cells  or  degraded  molecules.  For  instance,  bFGF 
in  the  microenvironment  will  be  degraded  via  “ bFGF  NullQ  0.05”. 

Rule  9:  Therapeutic  intervention 

<  Cell  >  (<  Mol  >~<  untreated  >)  — >■ 

<  Cell  >  (<  Mol  >~<  treated  >)  <  treat  crate  > 

Remark :  Given  a  validated  model,  intervention  rules  allow  us  to  evaluate  the  effectiveness  of  a 
therapy  targeting  at  certain  molecule(s).  Also,  the  well-tuned  value  of  the  intervention  rate  could, 
more  or  less,  give  indications  when  deciding  the  dose  of  medicine  used  in  this  therapy,  based  on 
the  Law  of  Mass  Action. 

Our  extension  allows  the  BioNetGen  language  to  be  able  to  model  not  only  the  signaling  net¬ 
work  within  a  single  cell,  but  also  interactions  among  multiple  cells.  It  also  allows  one  to  simulate 
the  dynamics  of  cell  populations,  which  is  crucial  to  cancer  study.  Moreover,  describing  the  intra¬ 
cellular  dynamics  using  the  style  of  BNs  improves  the  scalability  of  our  method  by  overcoming  the 
difficulty  of  obtaining  values  of  a  large  amount  of  model  parameters  from  wet  laboratory,  which 
is  a  common  bottleneck  of  conventional  rule-based  languages  and  ML-Rules.  Note  that,  similar  to 
other  rule -based  languages,  our  extended  one  allows  different  methods  for  model  analysis,  since 
more  than  one  semantics  can  be  defined  for  the  same  syntax. 


6.3  Statistical  Model  Checking 

Simulation  can  recapitulate  a  number  of  experimental  observations  and  provide  new  insights  into 
the  system.  However,  it  is  not  easy  to  manually  analyze  a  significant  amount  of  simulation  trajecto¬ 
ries,  especially  when  there  is  a  large  set  of  system  properties  to  be  tested.  Thus,  for  our  model,  we 


91 


employ  statistical  model  checking  (StatMC),  which  is  a  fully  automated  formal  analysis  technique. 
In  this  section,  we  provide  an  intuitive  and  brief  description  of  StatMC.  The  interested  reader  can 
find  more  details  in  HI 321 . 

In  general,  given  a  system  property  expressed  as  a  Bounded  Linear  Temporal  Logic  (BLTL) 
H132II  formula  and  the  set  of  simulation  trajectories  generated  by  applying  the  network-free  stochas¬ 
tic  simulation  (NFsim)  H1971  to  our  rule -based  model,  StatMC  estimates  the  probability  of  the 
model  satisfying  the  property. 

Bounded  Linear  Temporal  Logic 

Before  looking  into  the  main  ideas  of  StatMC,  we  first  introduce  BLTL  formally.  As  mentioned  in 
Chapter  3.2,  linear  temporal  logic  (LTL)  H178L  as  a  modal  temporal  logic  with  modalities  referring 
to  time,  is  widely  used  to  formally  encode  formulae  about  the  future  of  paths,  such  as  a  condition 
will  eventually  be  true  or  a  condition  will  be  true  until  another  fact  becomes  true.  BLTL  extends 
LTL  with  time  bounds  on  temporal  operators.  For  example,  the  following  BLTL  formula  can  be 
used  to  express  the  specification  it  is  not  the  case  that  within  5  seconds,  variable  vq  will  keep  the 
value  1  and  variable  v\  will  keep  the  value  0  for  10  seconds. 

~^F5Gw(v o  =  1  A  m  =  0) 

where  the  F5  operator  encodes  future  5  seconds,  G 1 0  expresses  globally  for  10  seconds,  and  n0 
and  v\  are  state  variables  of  the  model. 

The  syntax  of  BLTL  is  given  by: 

0  ::=  x  ~  n|-i  0101  V  02 10!  C/*02 

where  x  G  SV  (the  finite  set  of  state  variables),  ~G  {<,  <,  =,  >,  >},  v  G  Q,  and  t  G  Q>0.  Note 
that  the  operators  A,  F*,  and  G1  referenced  above  can  be  defined  as  follows:  F*0  =  True  (7*0, 
(7*0  =  -i F4-i0,  and  0i  A  02  =  -i(~i0i  V  -i02) 

The  semantics  of  BLTL  is  defined  with  respect  to  traces  (or  executions)  of  the  model.  For  this 
work,  a  trace  will  be  a  simulation  trajectory  of  our  multiscale  hybrid  rule-based  model.  Formally,  a 
trace  is  a  sequence  of  time-stamped  state  transitions  of  the  form  cr  =  (s0,  £0),  (s\,ti),-  ■  ■ ,  indicat¬ 
ing  that  the  system  moved  to  state  si+i  after  duration  tt  in  state  st.  The  fact  that  a  trace  cr  satisfies 
the  BLTL  property  0  is  denoted  by  s  \ =  0.  We  denote  the  execution  trace  starting  at  state  i  by  cr'  - 
The  value  of  the  state  variable  x  in  cr  at  the  state  i  is  denoted  by  V  (a,  i,  x).  The  semantics  of  BLTL 
for  a  trace  ak  starting  at  the  kth  state  (k  G  N)  is  defined  as  follows. 

•  ak  |=  x  ~  v  if  and  only  if  V (a,  k,  x)  ~  v; 

•  ak  |=  -i0  if  and  only  if  crk  |=  0  does  not  hold; 

•  ak  |=  0i  V  02  if  and  only  if  ak  \=  0i  or  ak  |=  02; 

•  ak  \=  01(7*02  if  and  only  if  there  exists  i  G  N+  such  that  (a)  Yl!)  X  '  0  0  L  (b)  crk+l  \=  02, 
and  (c)  for  each  0  <  j  <  i,  <Jk+j  |=  0i. 
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Statistical  Model  Checking 

In  this  work,  we  use  StatMC  to  estimate  the  probability  with  which  a  model  satisfies  a  given 
bounded  LTL  property.  Essentially,  given  our  model  M,  we  first  run  NFsim  on  it.  For  each 
generated  trajectory  with  a  predefined  small  length,  we  run  the  trace  checker  on  it  against  the  given 
bounded  FTF  formula.  The  trace  checker  will  decide  whether  to  stop  the  simulation  for  this  certain 
trajectory  and  pass  the  checking  result  to  the  statistical  tester,  or  to  continue  the  simulation  for  this 
trace  by  another  k  steps.  Checking  results  of  multiple  trajectories  will  be  used  by  the  statistical 
tester  to  estimate  the  probability.  The  tester  will  decide  when  to  stop  the  whole  simulation,  and 
return  the  final  estimated  probability  with  a  preset  small  error  bound.  The  whole  checking  process 
is  illustrated  in  Figure  |G2| 

In  detail,  since  the  underlying  semantic  model  of  the  stochastic  simulation  method  NFsim  that 
we  used  for  our  model  is  essentially  a  discrete-time  Markov  chain,  we  need  to  verify  stochastic 
models.  StatMC  treats  the  verification  problem  for  stochastic  models  as  a  statistical  inference 
problem,  using  randomized  sampling  to  generate  traces  (or  simulation  trajectories)  from  the  system 
model,  and  then  performing  model  checking  and  statistical  analysis  on  those  traces.  For  a  (closed) 
stochastic  model  and  a  BFTF  property  0,  the  probability  p  that  the  model  satisfies  0  is  well  defined 
(but  unknown  in  general).  For  a  fixed  0  <  6  <  1,  we  ask  whether  p  <  9,  or  what  the  value  of 
p  is.  In  StatMC,  the  first  question  is  solved  via  hypothesis  testing  methods,  while  the  second 
via  estimation  techniques.  Intuitively,  hypothesis  tests  are  probabilistic  decision  procedures,  i.e., 
algorithms  with  a  yes/no  reply,  and  which  may  give  wrong  answers.  Estimation  techniques  instead 
compute  (probabilistic)  approximations  of  the  unknown  probability  p.  The  main  assumption  of 
StatMC  is  that,  given  a  BFTF  property  0,  the  behavior  of  a  (closed)  stochastic  model  can  be 
described  by  a  Bernoulli  random  variable  of  parameter  p,  where  p  is  the  probability  that  the  system 
satisfies  0.  It  is  known  that  discrete-time  Markov  chains  satisfy  this  requirement  02111.  Therefore 
StatMC  can  be  applied  to  our  setting.  More  specifically,  given  a  is  a  system  execution  and  0  a 
BFTF  formula,  we  have  that  Prob{a\a  \ =  0}  =  p,  and  the  Bernoulli  random  variable  mentioned 
above  is  the  following  function  M  defined  as  follows:  M(a )  =  1  if  a  |=  0,  or  M (cr)  =  0 
otherwise.  Therefore,  M  will  be  1  with  probability  p  and  0  with  probability  1  —  p.  In  general, 
StatMC  works  by  first  obtaining  samples  of  M,  and  then  by  applying  statistical  techniques  to  such 
samples  to  solve  the  verification  problem. 


6.4  Results  and  Discussion 

In  this  section,  we  present  and  discuss  formal  analysis  results  for  our  pancreatic  cancer  microenvi¬ 
ronment  model.  The  model  file  is  available  at  http  :  /  / www .  cs  .  emu .  edu/  ~qinsiw/mpc_ 
model .  bngl  All  the  experiments  reported  below  were  conducted  on  a  machine  with  a  1.7  GHz 
Intel  Core  i7  processor  and  8GB RAM,  running  on  Ubuntu  14.04.1  FTS.  In  our  experiments,  we 
use  Bayesian  sequential  estimation  with  0.01  as  the  estimation  error  bound,  coverage  probability 
0.99,  and  a  uniform  prior  (a  =  (3  =  1).  The  time  bounds  and  thresholds  given  in  following  prop¬ 
erties  are  determined  by  considering  the  model’s  simulation  results.  The  parameters  in  our  model 
include  initial  state  (e.g.  abundance  of  extracellular  molecules)  and  reaction  rate  constants.  The 
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Figure  6.2:  The  statistical  model  checking  process  for  the  pancreatic  cancer  microenvironment 
model 
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initial  state  was  provided  by  biologists  based  on  wet-lab  measurements.  The  rate  constants  were 
estimated  based  on  the  general  ones  in  the  textbook  [|T3ll .  The  results  in  scenario  I  &  II  demon¬ 
strate  that  using  these  parameters  the  model  is  able  to  reproduce  key  observations  reported  in  the 
literature.  We  also  performed  a  sensitivity  analysis  and  the  results  show  that  the  system  behavior 
is  robust  to  most  of  the  parameters  (the  two  sensitive  parameters  have  been  labeled  in  our  model 
file). 

Scenario  I:  mutated  PCCs  with  no  treatments 

In  scenario  I,  we  validate  our  model  by  studying  the  role  of  PSCs  in  the  PC  development. 

Property  1:  This  property  aims  to  estimate  the  probability  that  the  population  of  PCCs  will  even¬ 
tually  reach  and  maintain  in  a  high  level. 

Prob=?  {( PCCtot  =  10)  A  F1200  G100  (. PCCtot  >  200)} 

First,  we  take  a  look  at  the  impact  from  the  presence  of  PSCs  on  the  dynamics  of  PCC  population. 
As  shown  in  Table |6.l[  with  PSCs,  the  probability  of  the  number  of  PCCs  reaching  and  keeping  in 
a  high  level  (Pr  =  0.9961)  is  much  higher  than  the  one  when  PSCs  are  absent  (Pr  =  0.405).  This 
indicates  that  PSCs  promote  PCCs  proliferation  during  the  progression  of  PC.  This  is  consistent 
with  experimental  findings  [16, 1771 12 1411 . 

Property  2:  This  property  aims  to  estimate  the  probability  that  the  number  of  migrated  PSCs  will 
eventually  reach  and  maintain  in  a  high  amount. 

Pro5=?  {( MigPSC  =  0)  A  F1200  G100  ( MigPSC  >  40)} 

We  then  study  the  impacts  from  PCCs  on  PSCs.  As  shown  in  Table  |6.1|  without  PCCs,  it  is  quite 
unlikely  (( Pr  =  0.1191)  for  quiescent  PSCs  to  be  activated.  While,  when  PCCs  exist,  the  chance  of 
PSCs  becoming  active  ((Pr  =  0.9961)  approaches  to  1.  This  confirms  the  observation  H 10711  that, 
during  the  development  of  PC,  PSCs  will  be  activated  by  growth  factors,  cytokines,  and  oxidant 
stress  secreted  or  induced  by  PCCs. 

Property  3:  This  property  aims  to  estimate  the  probability  that  the  number  of  PCCs  entering  the 
apoptosis  phase  will  be  larger  than  the  number  of  PCCs  starting  the  autophagy  process  and  this 
situation  will  be  reversed  eventually. 

Prob=?  {F400  (G300  ( ApoPCC  >  50  A  AutoPCC  <  50) 

AF700  G300  ( ApoPCC  <  50  A  AutoPCC  >50))} 

We  are  also  interested  in  the  mutually  exclusive  relationship  between  apoptosis  and  autophagy 
for  PCCs  reported  in  [119,  1571.  In  detail,  as  PC  progresses,  apoptosis  firstly  overwhelms  au¬ 
tophagy,  and  then  autophagy  takes  the  leading  place  after  a  certain  amount  of  time.  This  situation 
is  described  as  property  3  and  its  estimated  probability  is  close  to  1  (see  Table  [64} . 

Property  4:  This  property  aims  to  estimate  the  probability  that,  it  is  always  the  case  that,  once  the 
population  of  activated  PSCs  reaches  a  high  level,  the  number  of  migrated  PSCs  will  also  increase. 

Prob=7  {G1600  ( ActPSC  >  10  F100  ( MigPSC  >  10))} 
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Property 

Estimated  Prob 

#  Succ 

#  Sample 

Time  (s) 

Note 

Scenario  I:  mutated  PCCs  with  no  treatments 

1 

0.4053 

10585 

26112 

208.91 

w.o.  PSCs 

0.9961 

256 

256 

1.83 

w.  PSCs 

2 

0.1191 

830 

6976 

49.69 

w.o.  PCCs 

0.9961 

256 

256 

1.75 

w.  PCCs 

3 

0.9961 

256 

256 

5.21 

- 

4 

0.9961 

256 

256 

4.38 

- 

Scenario  II:  mutated  PCCs  with  different  exsiting  treatments 

5 

0.0004 

0 

2304 

17.13 

cetuximab  and  erlotinib 

0.0012 

10 

9152 

68.67 

gemcitabine 

0.7810 

8873 

11360 

114.25 

nab-paclitaxel 

0.8004 

7753 

9686 

73.83 

ruxolitinib 

Scenario  III:  mutated  PCCs  with  blocking  out  on  possible  target(s) 

6 

0.0792 

38363 

484128 

3727.99 

w.o.  inhibiting  ERK  in 
PSCs 

0.9822 

2201 

2240 

17.37 

w.  inhibiting  ERK  in 
PSCs 

7 

0.1979 

3409 

17232 

136.39 

w.o.  inhibiting  ERK  in 
PSCs 

0.9961 

256 

256 

2.01 

w.  inhibiting  ERK  in 
PSCs 

8 

0.2029 

2181 

10752 

92.57 

w.o.  inhibiting  MDM2  in 
PSCs 

0.9961 

256 

256 

2.18 

w.  inhibiting  MDM2  in 
PSCs 

9 

0.0004 

0 

2304 

15.77 

w.o.  inhibiting  RAS  in 
PCCs  and  ERK  in  PSCs 

0.9961 

256 

256 

3.15 

w.  inhibiting  RAS  in 
PCCs  and  ERK  in  PSCs 

10 

0.9797 

1349 

1376 

11.98 

w.o.  inhibiting  STAT  in 
PCCs  and  NFkB  in  PSCs 

0.1631 

1476 

9056 

81.61 

w.  inhibiting  STAT  in 
PCCs  and  NF/iB  in  PSCs 

Table  6.1:  Statistical  model  checking  results  for  properties  under  different  scenarios 


One  reason  why  PC  is  hard  to  be  cured  is  that  activated  PSCs  will  move  towards  mutated  PCCs,  and 
form  a  cocoon  for  the  tumor  cells,  which  can  protect  tumor  from  attacks  caused  by  therapies  lfl6l 
[861.  We  investigate  this  by  checking  property  4,  and  obtain  an  estimated  probability  approaching 
to  1  (see  Table  |6.1|). 
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Scenario  II:  mutated  PCCs  with  different  existing  treatments 

Property  5:  This  property  aims  to  estimate  the  probability  that  the  population  of  PCCs  will  even¬ 
tually  drop  to  and  maintain  in  a  low  amount. 

Prob=7  {( PCCtot  =  10)  A  F1200  G400  ( PCCtot  <  100)} 


Property  5  means  that,  after  some  time,  the  population  of  PCCs  can  be  maintained  in  a  compara¬ 
tively  low  amount,  implying  that  PC  is  under  control.  We  now  consider  5  different  drugs  that  are 
widely  used  in  PC  treatments  -  cetuximab,  erlotinib,  gemcitabine,  nab-paclitaxel,  and  ruxolitinib, 
and  estimate  the  probabilities  for  them  to  satisfy  property  5.  As  shown  in  Table  |6Tj  monoclonal 


antibody  targeting  EGFR  (cetuximab),  as  well  as  direct  inhibition  of  EGFR  (erlotinib)  broadly  do 
not  provide  a  survival  benefit  in  PCs.  Inhibition  of  MAPK  pathway  (gemcitabine)  has  also  not 
been  promising.  These  results  are  consistent  with  clinical  feedbacks  from  patients  J3).  While, 
strategies  aiming  at  depleting  the  PSCs  in  PCs  (i.e.  nab-paclitaxel)  can  be  successful  (with  an  es¬ 
timated  probability  0.7810).  Also,  inhibition  of  Jak/Stat  can  be  very  promising  (with  an  estimated 
probability  0.8004).  These  results  are  supported  by  H213H  and  [|127l.  respectively. 


Scenario  III:  mutated  PCCs  with  blocking  out  on  possible  target(s) 

Scenario  I  and  II  have  demonstrated  the  descriptive  and  predictive  power  of  our  model.  In 
scenario  III,  we  use  the  validated  model  to  identify  new  therapeutic  strategies  targeting  molecules 
in  PSCs.  Here  we  report  4  potential  target(s)  of  interest  from  our  screening. 

Property  6:  This  property  aims  to  estimate  the  probability  that  the  number  of  PSCs  will  eventually 
drop  to  and  maintain  in  a  low  level. 

Prob=7  {( PSCtot  =  5)  A  F1200  G400  ( PSCtot  <  30)} 

Property  7:  This  property  aims  to  estimate  the  probability  that  the  population  of  migrated  PSCs 
will  eventually  stay  in  a  low  amount. 

Prob=1  {( MigPSC  =  0)  A  F1200  G100  ( MigPSC  <  30)} 

The  verification  results  of  these  two  properties  (Table  |6.1[)  suggest  that  inhibiting  ERK  in  PSCs 
not  only  lowers  the  population  of  PSCs,  but  also  inhibits  PSC  migration.  The  former  function  can 
reduce  the  assistance  from  PSCs  in  the  progression  of  PCs  indirectly.  The  later  one  can  prevent 
PSCs  from  moving  towards  PCCs  and  forming  a  cocoon  to  protect  PCCs  against  cancer  treatments. 

Property  8:  This  property  aims  to  estimate  the  probability  that  the  number  of  PSCs  entering 
the  proliferation  phase  will  eventually  be  less  than  the  number  of  PSCs  starting  the  apoptosis 
programme  and  this  situation  will  maintain. 

Prob=7  {F1200  G400  ((PSC Pro  -  PSCApop )  <  0)} 

The  increased  probability  (from  0.2029  to  0.9961  as  shown  in  Table  [6T[)  indicates  that  inhibiting 
MDM2  in  PSCs  may  reduce  the  number  of  PSCs  by  inhibiting  PSCs’  proliferation  and/or  promot¬ 
ing  their  apoptosis.  Similar  to  the  former  role  of  inhibiting  ERK  in  PSCs,  it  can  help  to  treat  PCs 
by  alleviating  the  burden  caused  by  PSCs. 
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Property  9:  This  property  aims  to  estimate  the  probability  that  the  number  of  bFGF  will  eventually 
stay  in  such  a  low  level. 

Prob=1  (F1200  G400  ( bFGF  <  100)} 


As  mentioned  in  property  5,  6,  and  7,  inhibiting  RAS  in  PCCs  can  lower  the  number  of  PCCs, 
and  downregulating  ERK  in  PSCs  can  inhibit  their  proliferation  and  migration.  Besides  these,  we 
find  that,  when  inhibiting  RAS  in  PCCs  and  ERK  in  PSCs  simultaneously,  the  concentration  of 
bFGF  in  the  microenvironment  drops  (see  Table  |6.1[).  As  bFGF  is  a  key  molecule  that  induces 
proliferation  of  both  cell  types,  targeting  RAS  in  PCCs  and  ERK  in  PSCs  at  the  same  time  could 
synergistically  improve  PC  treatment. 


Property  10:  This  property  aims  to  estimate  the  probability  that  the  concentration  of  VEGF  will 
eventually  reach  and  keep  in  a  high  level. 


Prob=7  (F400  G100  ( VEGF  >  200)} 


Furthermore,  inhibiting  STAT  in  PCCs  and  NFkB  in  PSCs  simultaneously  postpones  and  lowers 


the  secretion  of  VEGF  (see  Table  6.1 ).  VEGF  plays  an  important  role  in  the  angiogenesis  and 


metastasis  of  pancreatic  tumors.  Thus,  the  combinatory  inhibition  of  STAT  in  PCCs  and  NFkB  in 
PSCs  may  be  another  potential  strategy  for  PC  therapies. 
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Chapter  7 

Joint  Efforts  of  Formal  Methods  and 
Machine  Learning  to  Automate  Biological 
Model  Design 


The  works  discussed  in  previous  chapters  have  demonstrated  that  constructing  and  analyzing  ap¬ 
propriate  models  can  help  in  explaining  biological  systems  that  we  are  studying  via  presenting 
their  core  dynamics,  so  as  to  allow  us  to  discover  new  questions  and  challenge  existing  theories. 
However,  the  creation  of  models  often  requires  intense  human  effort.  For  example,  to  construct 
the  multicellular  and  multiscale  model  of  pancreatic  cancer  environment,  I  had  read  about  300  re¬ 
lated  papers,  and  had  weekly  meetings  with  experts  to  understand  the  behavior  of  the  system  and 
to  handle  conflicting  statements  found  in  distinct  publications.  This  laborious  process  results  in  a 
slow  development  of  models,  let  alone  validating  and  extending  them  with  recently  reported  find¬ 
ings  in  newly  published  literature.  To  order  to  allow  researchers  to  re-use  existing  findings  about  a 
certain  biological  system  in  a  comprehensive  and  timely  manner,  we  really  need  a  framework  that 
provides  functionalities  for  the  automation  of  information  extraction  from  existing  literature,  for 
the  correct  and  smart  information  assembly,  and  for  the  accurate  and  efficient  model  analysis. 

Over  the  last  decade,  several  automated  reading  engines  have  been  developed  and  successfully 
adopted  to  extract  interactions  among  biological  entities  from  literature  (e.g.  Il208l0.  These  auto¬ 
readers  are  quite  efficient.  That  is,  they  are  capable  of  finding  hundreds  of  thousands  of  interactions 
from  thousands  of  papers  in  a  few  hours  1120811 .  The  existence  of  these  auto-readers,  together  with  a 
given  set  of  keywords,  can  promise  to  offer  a  vast  amount  of  related  system  information  extracted 
from  published  papers,  which  will  be  used  for  the  later  model  assembly. 

However,  in  order  to  accurately  and  efficiently  incorporate  these  pieces  of  knowledge  into  a 
model,  selection  methods  are  needed  to  choose  correct  and  useful  ones  from  the  given  huge  amount 
of  extracted  information.  The  integrated  model  should  satisfy  important  system  properties.  When  a 
baseline  model  is  used  as  the  initial  model,  the  extended  model  should  retain  system  properties  that 
are  satisfied  by  the  baseline  model,  or  even  reflect  other  system  properties  that  the  baseline  model 
fails  to  satisfy.  Moreover,  it  is  also  useful  to  detect  extended  models  where  minimal  interventions 
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in  the  model  can  lead  to  significant  changes  in  outcomes. 

To  achieve  this,  in  this  chapter,  we  propose  and  implement  a  pipeline  framework  LEaRn,  as  il¬ 
lustrated  in  Figure  [7T|  As  the  first  step,  given  related  papers  and  a  set  of  keywords,  the  auto-reader 
REACH  H208H  is  used  to  extract  casual  relations  with  regards  to  biological  entities  of  interest. 
Then,  together  with  a  baseline  model,  the  extracted  relations  are  preselected  using  different  heuris¬ 
tics  that  will  be  discussed  in  detail  in  the  later  section.  A  set  of  extended  models  will  be  generated 
through  this  preselection  phase.  Afterwards,  these  generated  models  will  be  checked  against  a  set 
of  system  properties  as  the  final  selection  standard.  This  pipeline  offers  a  platform  to  utilize  pre¬ 
viously  reported  findings  about  a  certain  system,  to  validate  existing  knowledge,  and  to  test  new 
hypotheses.  To  demonstrate  the  feasibility  of  this  proposed  framework,  we  have  looked  into  a  case 
study  with  regards  to  the  pancreatic  cancer  microenvironment. 


Auto-Reader 

REACH 


H  Selection 


Statistical 
Model  Checking 


Checking  Results 
Evaluation 


Figure  7.1:  The  framework  of  LEaRn 


7.1  Outputs  from  Auto-reading 

Auto-reading  output  and  data  structure 

For  the  current  version  of  our  pipeline  framework  LEaRn,  the  biological  systems  that  we  are  study¬ 
ing  are  cellular  signaling  networks.  Cell  signaling  is  part  of  a  complex  system  of  communication 
that  governs  basic  activities  of  cells  and  coordinates  cell  actions.  Signal  transduction  along  a  path¬ 
way  occurs  when  an  extracellular  molecule  activates  a  specific  receptor  located  on  the  cell  surface 
or  sometimes  inside  the  cell.  This  receptor  then  triggers  a  chain  of  events  within  the  cell,  creating 
a  response.  Multiple  pathways  interact  with  one  another  to  form  a  network. 

Within  published  papers,  descriptions  of  interactions  forming  a  signaling  network  can  be  clas¬ 
sified  into  three  groups  according  to  how  complete  the  information  about  a  certain  interaction  can 
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be  found  in  papers.  The  first  group  contains  qualitative  relations.  One  example  can  be  “the  signal¬ 
ing  enzymes  encoded  by  PIK3CA  and  BRAF  are,  in  part,  regulated  by  direct  binding  to  activated 
forms  of  the  Ras  protein”  in  lll89l.  From  this  type  of  relations,  we  can  only  obtain  information 
indicating  that  one  biological  entity  places  either  positive  or  negative  impact  on  another  entity. 
Relations  belonging  to  the  second  group,  named  as  semi-quantitative  relations,  also  provide  rough 
information  about  the  amounts  of  involved  entities,  the  extent  of  a  certain  impact,  location  where 
a  certain  interaction  takes  place,  and  so  on.  For  instance,  we  found  “we  treated  CHO-KI  cells 
expressing  EGFR  T669A  with  HRG  ligand  to  induce  maximal  ERBB3  phosphorylation”  in  H207I. 
The  last  group,  quantitative  relations,  offers  complete  and  precise  information  about  all  the  details 
of  a  certain  interaction.  This  kind  of  relations  are  the  ones  that  we  are  expecting  from  the  auto¬ 
reading.  But,  unfortunately,  in  reality,  only  a  small  part  of  auto-reading  outputs  can  be  grouped  as 
quantitative  relations. 

The  automated  reading  engine  REACH  H2081  that  is  adopted  in  our  pipeline  framework  can  ex¬ 
tract  events  in  the  form  of  frames  that  contain  an  interaction  with  two  biological  entities.  Consider¬ 
ing  that  only  a  very  small  amount  of  extracted  interactions  have  complete  quantitative  information, 
in  LEaRn,  we  choose  to  use  a  triple  <  P,Q,  +/—  >  to  represent  individual  interactions.  Within  a 
triple,  P  and  Q  are  biological  entities,  such  as  genes,  proteins,  and  cell  functions.  “+”  means  that 
P  places  a  positive  impact  on  Q,  which  can  be  activating,  phosphorylating,  and  so  on.  While, 
indicates  that  the  impact  from  P  to  Q  is  negative,  including  inhibition,  dephosphorylation,  etc.  We 
use  this  data  structure  to  capture  the  discrete  structure  of  signaling  networks. 


Modeling  formalism  for  baseline  and  extended  models 

To  be  consistent  with  of  the  modeling  format  used  to  represent  extracted  relations,  in  LEaRn , 
Boolean  Networks  (BNs)  is  used  to  describe  both  the  baseline  model  and  extended  models.  As 
discussed  in  Chapter  2.2,  when  using  BNs  to  capture  the  dynamics  of  signaling  networks,  each 
node  in  a  BN  represents  a  biological  entity  in  a  corresponding  signaling  network,  and  can  have 
binary  values.  The  state  evolution  of  a  node  from  discrete  time  point  t  to  t  +  1  is  described  by  a 
Boolean  updating  function  involving  this  certain  node  and  its  parent  nodes.  To  recall  how  BNs  can 
be  used  to  model  signaling  networks,  we  provide  the  following  simple  example.  In  this  example 
pathway  see  Figure |T2fc,  v\  is  activated  by  v2,  v2  is  activated  by  both  v\  and  v3,  and  i;3  is  inhibited 
by  v\.  By  describing  it  as  a  boolean  network,  v\,  v2,  and  i;3  are  treated  as  boolean  variables  whose 
next  time  values  can  be  computed  by  using  boolean  functions  listed  in  Figure  [7T2|d. 


7.2  Preselection  on  Causal  Relations  and  Model  Generation 

Classification  of  causal  relations 

Given  a  set  of  well  formatted  causal  relations,  to  carry  out  the  model  assembly,  one  can  start  with  or 
without  a  baseline  model.  When  there  is  no  baseline  model,  extracted  relations  are  usually  needed 
to  be  scored  via  counting  the  occurrence  of  a  certain  relation  learned  from  multiple  papers,  total 
citation  of  papers  reporting  this  relation,  and  so  on.  After  filtering  out  incomplete  or  duplicated 
relations,  and  handling  conflicting  ones,  a  set  of  relations  can  be  chosen,  according  to  their  scores, 
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Figure  7.2:  Boolean  network  model  of  a  simple  signaling  network 


their  relationships  with  biological  entities  of  interest,  and  their  connectivity,  to  construct  the  final 
model(s).  In  the  current  version  of  LEaRn,  we  consider  the  situation  where  there  is  a  validated 
baseline  model. 

Then,  given  a  baseline  model,  extracted  relations  can  be  first  classified  into  three  types  accord¬ 
ing  to  their  relationship  to  a  given  model  as  following. 

•  Corroborations:  an  interaction  extracted  from  papers  confirms  one  interaction  existing  in  the 
baseline  model. 

•  Extensions:  a  relation  learned  from  literature  adds  new  information  into  the  baseline  model. 
According  to  the  type  of  newly  added  information,  relations  in  the  “Extension”  category  can 
be  further  classified  into  the  following  three  groups. 

-  When  the  new  information  is  a  new  connection  between  two  biological  entities  within 
the  baseline  model,  it  means  that  both  elements  within  this  certain  relation  are  already 
in  the  baseline  model.  Adding  this  kind  of  extension  usually  causes  a  direct  influence 
on  the  behavior  of  the  resulting  model,  as  structural  changes  of  a  signaling  network 
may  lead  to  a  significant  difference  in  the  regulatory  behavior. 

-  When  the  new  information  is  a  relation  between  a  biological  entity  in  the  baseline 
model  and  a  new  entity,  there  are  two  cases.  In  cases  where  the  regulated  element  is 
not  in  the  baseline  model,  the  regulated  element  will  just  hang  from  a  pathway  without 
having  direct  influence  on  the  model.  While,  in  cases  where  the  regulator  is  outside  the 
baseline  model,  the  regulator  can  act  as  a  new  model  input,  allowing  for  the  additional 
network  control. 
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-  When  the  new  information  is  a  relation  between  two  biological  entities  not  mentioned 
in  the  baseline  model,  adding  such  an  interaction  alone  into  the  baseline  model  will 
not  affect  the  behavior  of  the  model.  However,  when  we  are  considering  multiple  ex¬ 
tensions  connecting  this  interaction  with  elements  in  the  baseline  model  concurrently, 
additional  regulatory  pathways  will  be  constructed  to  place  impacts  on  the  model  be¬ 
havior. 

•  Contradictions:  an  interaction  from  auto-reading  suggests  a  conflicting  mechanism  men¬ 
tioned  in  the  baseline  model.  For  example,  in  the  baseline  model,  entity  A  can  activate  B. 
While,  an  extracted  relation  says  the  opposite,  i.e.  A  inhibits  B  in  this  system. 

In  this  work,  we  only  consider  relations  belonging  to  the  “Extension”  group,  that  is,  new  interac¬ 
tions  that  can  be  added  to  the  model.  Using  the  information  of  corroborations  to  add  weights  / 
scores  to  relations,  and  handling  contradictions  are  parts  of  our  future  work. 

Heuristics  for  model  generation 

Although  we  only  consider  extracted  relations  belonging  to  the  “Extension”  group,  there  are  still 
a  vast  amount  of  interactions.  Given  n  newly  learned  causal  relations,  there  will  be  2n  possible 
extended  models  if  we  enumerate  all  configurations  of  whether  or  not  to  add  a  new  relation.  This 
exponentially  growing  number  is  impossible  to  handle,  therefore,  we  need  heuristic  methods  to 
search  for  suitable  configurations  of  model  extensions.  Note  that,  there  are  several  ways  to  design 
selection  heuristic,  such  as  using  scoring  functions,  considering  the  connectivity  of  the  extended 
models,  and  so  on.  In  here,  we  introduce  four  heuristics  by  considering  whether  and  how  selected 
relation  will  influence  the  satisfiability  of  the  resulting  model  against  a  given  set  of  important 
system  properties. 

Since  the  given  set  of  system  properties  is  key  to  our  selection  heuristics,  we  define  all  the 
biological  entities  mentioned  in  this  given  set  of  properties  as  “elements  of  interest”.  Note  that,  in 
general,  the  set  of  “elements  of  interest”  can  also  be  defined  by  user,  depending  on  the  questions 
asked  or  hypotheses  tested.  Then,  a  concept  “layer”  is  introduced  for  individual  biological  entities 
mentioned  either  in  the  baseline  model  or  in  extracted  interactions.  The  value  of  “layer”  for  each 
entity  is  the  length  of  the  shortest  path  connecting  this  certain  entity  with  any  element  in  the  set 
of  “elements  of  interest”,  and  can  be  computed  iteratively.  That  is,  elements  within  “elements  of 
interest”  are  in  layer  0.  Layer  1  contains  direct  regulators  of  elements  in  layer  0,  which  are  not 
listed  in  layer  0.  Similarly,  layer  i  +  1  includes  direct  regulators  of  elements  in  layer  i,  which 
are  not  listed  in  layer  {0, 1,  •  •  •  ,  i}.  With  these  two  concepts,  we  propose  four  heuristics  to  select 
extension  configurations. 

•  Cumulative  parent-set  with  direct  extensions  ( CD(n )):  using  this  heuristic,  given  a  layer 
n,  we  select  all  extracted  interactions  that  contains  any  element  from  layer  0  up  to  layer  n, 
and  add  the  selected  relations  into  the  baseline  model.  For  each  different  n,  one  extended 
model  can  be  generated  until  the  set  of  selected  relations  cannot  be  extended  anymore. 

•  Non-cumulative  parent-set  with  direct  extensions  ( ND(n )):  given  a  layer  n,  unlike  CD, 
ND  only  chooses  relations  containing  elements  in  layer  n.  One  reason  to  use  this  heuristic 
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is  that,  sometimes  it  is  interesting  to  the  influence  only  from  the  nth  layer  to  the  resulting 
model,  so  as  to  identify  individual  extension  layers  that  may  cause  significant  changes  to 
the  performance  considering  different  properties.  The  other  reason  is  that,  biological  enti¬ 
ties  mentioned  in  extracted  interactions  sometimes  impact  “elements  of  interest”  indirectly 
through  long  interacting  paths. 

•  Cumulative  parent-set  with  indirect  extensions  ( CI(n )):  for  the  previous  two  heuristics, 
we  find  biological  entities  in  each  layer  only  by  looking  for  direct  regulators  of  elements  in 
the  previous  layer.  While,  when  using  Cl,  we  look  for  indirect  regulators  as  well.  In  detail, 
given  a  layer  n,  we  select  all  extracted  relations  that  place  either  direct  or  indirect  impact 
on  elements  in  layer  n.  This  heuristic  usually  includes  pathways  outside  the  baseline  model 
more  often  then  the  other  methods. 

•  Non-cumulative  parent-set  with  indirect  extensions  (NI(n,  m)):  this  method  is  the  com¬ 
bination  of  Cl  and  ND.  The  goal  of  this  heuristic  is  to  provide  information  about  the  in¬ 
fluence  caused  by  relations  belonging  to  m  layers  containing  indirect  edges,  starting  from 
the  nth  layer.  In  other  words,  we  first  find  biological  entities  in  the  nth  layer  using  the  ND 
heuristic,  and  perform  the  operation  of  Cl  for  m  times  to  find  all  interactions  we  are  inter¬ 
ested  in.  The  reason  why  we  propose  this  heuristic  is  that,  it  is  important  to  consider  the 
impacts  from  interactions  happening  in  the  nearby  cells  on  the  signaling  network  consisting 
of  biological  entities  in  “elements  of  interest”. 


7.3  Selection  using  Statistical  Model  Checking 

After  obtaining  a  set  of  extended  models  using  the  above  heuristics,  the  final  model  selection  is  car¬ 
ried  out  by  applying  statistical  model  checking  to  each  generated  model  against  a  set  of  important 
system  properties  for  a  certain  biological  system.  Given  the  BN  representations  of  these  extended 
models,  although  simulating  these  logical  models  is  known  to  be  able  to  recapitulate  certain  exper¬ 
imental  observations  H164],  verifying  simulation  results  against  the  given  properties  manually  is 
tedious  and  error-prone,  especially  when  the  number  of  models  or  properties  are  large.  A  feasible 
way  to  tackle  this  problem  is  to  use  formal  methods.  In  our  pipeline  framework,  statistical  model 
checking  (StatMC)  is  adopted.  As  discussed  in  Chapter  6.3,  StatMC,  as  a  fully  automated  formal 
analysis  technique,  can  be  used  to  estimate  the  probability  with  which  a  model  satisfies  a  given 
bounded  LTL  property.  As  illustrated  in  Figure  [6T2| in  Chapter  6.3,  StatMC  starts  with  carrying  out 
the  stochastic  simulation  on  the  given  model.  In  this  work,  a  publicly  available  stochastic  simulator 
m\im  is  used  on  our  extended  BN  models.  In  the  simulator,  there  are  several  distinct  simulation 
schemes  that  can  be  used  to  consider  different  timing  and  element  update  approaches  occurring  in 
biological  systems.  The  simulation  scheme  that  we  use  for  this  work  is  called  “Random  Sequential 
Step-Based  Uniform”.  That  is,  in  each  discrete  simulation  step,  one  element  in  the  given  model 
is  chosen  randomly.  Then,  its  Boolean  updating  function  is  applied  to  compute  the  new  value  of 
this  chosen  element.  Before  starting  the  stochastic  simulation,  the  upper  bound  of  sequential  steps 
is  defined.  In  the  case  using  the  uniform  updating  approach,  all  model  elements  have  the  same 
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probability  of  being  chosen.  Values  of  variable  in  each  step,  starting  from  the  initial  state  till  the 
given  upper  bound,  are  recorded  for  the  later  trace  checking  against  a  given  bounded  LTL  property. 

Since  the  underlying  semantic  model  of  the  stochastic  simulation  method  that  we  use  for  the 
extended  BN  models  is  essentially  a  discrete-time  Markov  chain,  the  verification  problem  is  to 
compute  the  probability  with  which  a  given  temporal  logic  formula  is  satisfied  by  the  system.  For 
large  and  complex  models,  numerical  methods  aiming  at  computing  the  exact  probability  suffer 
from  the  state  explosion  problem.  While,  StatMC,  instead  of  searching  the  entire  state  space,  uses 
statistical  testing  methods  to  provide  an  efficient  way  to  estimate  the  probability  with  a  preset  small 
error  bound. 


7.4  Result  and  Discussion 

The  framework  is  implemented  in  Python.  The  simulator  described  in  Chapter  7.3  is  implemented 
in  Java  dTJ.  We  use  PRISM  lll50ft  as  our  statistical  model  checker,  which  is  a  C++  tool  for  formal 
modeling  and  analysis  of  stochastic  systems.  Evaluating  a  model  against  one  property,  including 
running  the  simulations,  takes  about  10  minutes  on  a  regular  laptop.  The  other  components  in  the 
framework  take  less  than  1  minute. 

Baseline  model 

The  system  that  we  studied  is  pancreatic  cancer  microenvironment,  especially  the  interplay  be¬ 
tween  pancreatic  cancer  cells  (PCCs)  and  pancreatic  stellate  cells  (PSCs).  We  use  the  BN  repre¬ 
sentation  of  our  microenvironment  model  (see  Figure  |6T|  in  Chapter  6.1)  as  the  baseline  model. 
This  model  contains  three  parts  -  intracellular  signaling  networks  of  PCC  and  PSC,  and  the  their  in¬ 
terplay  supported  by  extracellular  molecules  in  the  tumor  microenvironment.  In  this  model,  several 
cellular  functions,  such  as  autophagy,  apoptosis,  proliferation,  migration,  are  also  implemented  as 
variables  inside  the  model,  which  allows  for  better  understanding  of  the  system’s  behavior.  In  de¬ 
tail,  there  are  30  variables  encoding  intracellular  molecules  in  PCC  and  3  variables  encoding  the 
cell  functions  of  PCC.  For  PSC,  there  are  24  variables  for  intracellular  molecules  and  4  variables 
for  its  cell  functions.  In  extracellular  microenvironment,  there  are  8  variables  encoding  extracellu¬ 
lar  growth  factors,  and  1  environmental  function  variable.  In  total,  there  are  70  variables  and  114 
interactions  in  the  baseline  model.  The  interaction  rules  of  this  model  are  summarized  in  Table  1 
in  the  Supplementary  material  (http  :  /  /ppt .  cc/X1WF7  ). 

System  properties  as  the  selection  standard 

To  demonstrate  the  feasibility  of  our  framework,  we  apply  the  proposed  pipeline  framework  to 
the  study  of  the  interplay  between  PCCs  and  PSCs  in  the  pancreatic  cancer  microenvironment. 
We  identify  a  set  of  system  properties  according  to  the  experimental  observations  reported  with 
regards  to  this  biological  system,  and  extract  the  set  of  “elements  of  interest”  from  the  given  set 
of  properties.  As  listed  in  Table  |7.1[  we  are  interested  in  observing  the  changes  with  respect  to 
important  growth  factors  in  the  tumor  microenvironment,  oncoproteins  in  both  PCCs  and  PSCs, 
tumor  suppressors  in  PCCs,  and  cell  functions  of  PCCs  and  PSCs. 
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Auto-reading  outputs  and  extended  models 

We  used  the  REACH  automated  reading  engine  0  output  produced  from  13,000  papers  in  publicly 
available  domain.  This  output  consists  of  500,000  event  files,  with  170,000  possible  extensions  of 
our  model  (other  events  are  corroborations  or  contradictions).  Although  there  are  170,000  model 
extensions  produced  by  reading,  many  of  them  are  repetitions,  and  some  of  the  reading  outputs 
were  missing  one  of  the  interaction  participants.  Therefore,  in  this  work  we  used  overall  1232 
different  interactions  from  reading  output,  which  could  lead  to  2 1232  possible  models.  Studying 
all  possible  model  versions  is  impractical,  and  therefore,  we  used  the  four  extension  methods 
described  in  Chapter  7.2,  to  generate  46  different  models.  Using  the  CD  method,  we  generated  2 
models  by  having  1  or  2  layers.  For  ND,  the  number  of  layers  we  considered  varied  between  1 
and  10,  which  resulted  in  10  models.  With  Cl,  we  used  either  0, 1,  2  or  3  layers,  which  led  to  4 
different  models.  Finally,  for  NI,  we  have  n  ranges  from  1  layer  to  10  layer,  and  m  ranges  from  1 
to  3,  resulting  in  30  models.  We  also  test  the  model  with  all  extensions  being  added  to  the  baseline 
model. 

Fig.  17.3(a)  summarizes  results  of  our  extension  methods  on  1232  interactions  with  respect  to 
new  node  connections  to  the  model:  (i)  number  of  new  nodes  regulating  baseline  model  elements, 
not  regulated  by  baseline  model  elements  (dark  blue);  (ii)  number  of  new  nodes  regulating  baseline 
model  elements,  not  regulated  by  any  element,  baseline  or  new  (red);  (iii)  number  of  new  nodes 
regulated  by  baseline  model  elements,  not  regulating  any  elements  in  the  baseline  model  (yellow); 
(iv)  number  of  new  nodes  regulated  by  baseline  model  elements,  not  regulating  any  element,  base¬ 
line  or  new  (purple);  (v)  number  of  new  nodes  inserted  into  existing  pathway  -  new  regulators  of 
baseline  model  elements  that  are  also  regulated  by  baseline  model  elements  (green);  (vi)  number 
of  new  nodes  as  intermediate  elements  of  new  pathways  when  multiple  extensions  are  connected 
(light  blue);  (vii)  total  number  of  all  elements  used  in  the  extension  method  (dark  red).  In  Fig. 
|7.3(a),  four  different  sections  can  be  observed,  and  each  section  corresponds  to  one  of  the  exten¬ 
sion  methods.  Each  method  has  its  unique  feature.  For  example,  the  ND  method  only  includes 
relationships  relevant  to  one  layer,  and  this  makes  the  number  of  new  elements  added  to  the  model 
significantly  smaller  than  other  methods.  Also,  the  light  blue  nodes  tell  us  the  number  of  newly 
added  elements  that  are  in  a  newly  formed  pathway.  Since  CD  and  ND  do  not  include  indirect 
parent  interactions,  we  can  see  that  the  number  of  elements  in  new  pathway  is  0.  While  in  Cl 
and  NI,  we  can  tell  that  indirect  interactions  are  included.  The  numbers  within  one  method  show 
higher  similarity,  but  we  can  still  observe  some  patterns.  For  example,  the  cumulative  parent-set 
methods,  CD  and  Cl  show  an  increase  in  the  number  of  new  nodes  when  more  layers  are  consid¬ 
ered.  Furthermore,  since  NI  has  cumulative  parents  when  they  finish  the  noncumulative  part,  they 
also  experience  an  increase  when  the  step  of  noncumulative  part  is  fixed.  The  numbers  saturate  at 
around  600,  which  is  due  to  the  limited  size  of  baseline  model  and  extensions  we  have.  This  is  also 
the  reason  we  choose  to  perform  cumulative  steps  for  at  most  3  steps. 

In  general,  choosing  the  method  to  extend  the  model  depends  on  the  scenario  a  user  is  interested 
in.  For  example,  if  the  focus  is  on  the  regulation  of  a  specific  element,  one  can  track  down  each 
layer  of  parents  using  ND,  and  see  the  change  of  the  model  after  modifying  that  specific  layer.  On 
the  other  hand,  if  the  goal  is  to  include  as  many  new  stimuli  as  possible  with  a  fewer  number  of 
layers,  cumulative  methods  such  as  Cl  or  CD  will  fit  better.  We  selected  20  elements  as  part  of  the 
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base  layer,  since  these  elements  appear  in  properties  that  we  are  testing,  leading  to  relatively  large 
base  layer  given  the  size  of  the  baseline  model.  Therefore,  by  incorporating  elements  related  to 
more  than  one  layer,  we  capture  almost  all  extensions  related  to  the  baseline  model.  Thus,  the  ’All 
In’  method,  which  adds  all  extension  interactions  to  the  baseline  model  at  once,  does  not  change 
the  counts  shown  in  Fig.  |7.3[a),  when  compared  to  many  cases  of  CD,  Cl  and  NI  methods. 


Figure  7.3:  (a)  Counts  for  newly  added  elements  with  certain  structure  (Reg.  -  regulator,  Tgt.  - 
regulated  element,  Orig.  -  baseline  model  elements,  New  -  newly  added  element).  All  models 
studied  are  listed  on  x-axis,  and  y-axis  is  the  count  of  new  elements  having  certain  structure,  (b) 
Results  of  statistical  model  checking  of  20  properties  in  68  different  models.  Each  entity  in  x-axis 
is  a  model,  and  each  row  is  the  estimated  probability  for  the  corresponding  property,  (c)  The  Max 
and  min  difference  from  the  baseline  model  of  each  property  H153II 


Impact  of  model  extension  on  system  properties 

Fig.  |7.3(b)  shows  the  results  of  testing  48  models  (baseline  +  All-In  +  46  extended  models)  with 
different  extension  method  ( AND/OR )  and  different  initialization  of  the  newly  added  elements 
{True! False)  against  the  20  properties  in  Table  7.1  The  values  displayed  are  the  estimated  prob¬ 
abilities  of  each  property.  Just  like  the  basic  numbers  of  each  model,  different  extension  methods 
lead  to  different  results  of  the  properties.  For  example,  we  can  see  that  the  results  from  ND  are  dif¬ 
ferent  from  other  methods.  The  reason  is  that  each  ND  method  only  deals  with  one  layer  at  a  time, 
and  it  will  not  insert  new  edges  between  elements  mentioned  in  the  properties.  This  leads  to  a  more 
conservative  extension.  Also,  for  example,  there  are  differences  between  OR-based  ND  models  in 
properties  9  to  13  or  property  4  in  AND-based  ND  models,  which  are  related  to  Inhibition  of  tu¬ 
mor  suppressors  and  autophagy  in  PCCs.  By  comparing  the  extension  interactions  added  to  those 
models,  we  found  that  the  EGF  (Epidermal  Growth  Factor)  pathway  plays  the  most  important  role. 
The  p21  (regulator  of  cell  cycle  progression)  pathway  also  influences  the  difference. 


If  we  compare  the  models  with  different  initialization  of  newly  added  nodes,  we  can  see  the 
results  are  actually  quite  similar.  This  means  that  the  model  is  mostly  influenced  by  the  input 
elements  in  the  baseline  model,  and  to  some  degree,  it  tells  the  robustness  of  the  original  model. 
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On  the  other  hand,  if  we  compare  extending  the  models  with  OR  operations  and  those  with  AND 
operations,  there  is  a  huge  difference.  But  the  interesting  part  is  that  the  behavior  of  models  with 
the  two  types  of  extensions  is  opposite.  They  behave  similarly  only  in  properties  9,  13,  16,  19 
and  20,  while  differently  in  all  other  15  properties.  This  shows  a  drastic  difference  between  AND- 
based  and  OR-based  extension,  and  can  be  further  designed  according  to  the  property  we  want  to 
fit.  Fig.  |7.3[c)  shows  the  maximum  /  minimum  difference  compared  to  baseline  that  each  model 
can  achieve  for  each  property.  If  a  property  probability  is  low  in  both  max  and  min  difference,  it 
is  relatively  conservative  to  the  extension  interaction.  An  example  is  property  16,  which  depicts 
the  relationship  between  p53  and  Apoptosis.  On  the  other  hand,  if  a  property  probability  is  high  in 
both  max  and  min  difference,  it  is  a  property  susceptible  to  change  value  with  extensions. 
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#  |  Property  Description 

Growth  factors  in  the  tumor  microenvironment 

1 

Ff  10001  fGflOOOOl  VEGF) 

Within  1000  time  units,  the  concentration  of 
VEGF  (/  bFGF  /  PDGFBB  /  TGFfSl)  in  the  tumor 
microenvironment  will  eventually  reach  a  high 
amount,  and  stay  in  this  high  level  for  at  least 
another  10000  time  units. 

2 

F[1000]  (G[10000]  bFGF) 

3 

F[1000]  (G[10000]  PDGFBB) 

4 

FflOOO]  (GflOOOO]  TGFfll) 

Oncoproteins  in  pancreatic  cancer  and  stellate  cells  j 

5 

F[1400]  (G[10000]  PCCHER2) 

Within  1400  time  units,  the  concentration  of 
HER2  in  PCCs  (/  RAS  in  PCCs  /  VEGFR  in  PSCs  / 
ERK  in  PCCs)  will  eventually  reach  a  high 
amount,  and  stay  in  this  high  level  for  at  least 
another  10000  time  units. 

6 

F[14001  ("Gf  100001  PCCRAS) 

7 

F[14001  ("Gf  100001  PSCVEGFR) 

8 

F[1400]  (GflOOOO]  PCCERK) 

Tumor  suppressors  in  pancreatic  cancer  cells  \ 

9 

F[1000]  (PCCP21  A  F[2000]  (GflOOOO]  (!  PCCP21))) 

The  concentration  of  P21  (/  PTEN  /  RB  /  P53) 
in  PCCs  will  reach  a  high  level  and  act  as  a 
tumor  suppressor  within  the  first  1000  time 
units.  Then,  after  at  most  2000  time  units,  its 
concentration  will  eventually  drop  to  a  low 
level,  and  stay  in  this  low  level  for  at  least 
another  10000  time  units. 

10 

Ff  10001  (PCCPTEN  A  F[20001  (GflOOOOl  (!  PCCPTEN))) 

11 

Ff  10001  (PCCRB1  A  Ff20001  (GflOOOOl  0  PCCRB))) 

12 

FflOOO]  (PCCP53  A  F[2000]  (G[10000]  (!  PCCP53))) 

Cell  functions  of  pancreatic  cancer  cells  | 

13 

F[1000]  ((!  PCCAutophagy)  A 

F[2000](G[10000]  PCCAutophagy)) 

In  the  development  of  pancreatic  cancer, 
apoptosis  firstly  overwhelms  autophagy,  and 
then  autophagy  takes  the  leading  place  after  a 
certain  time  point. 

14 

F[1000]  (PCCApoptosis  A 

F[2000]  (GflOOOO]  (!  PCCApoptosis))) 

15 

FflOOO]  (GflOOOO]  PCCProliferation) 

Within  1000  time  units,  PCCs’  proliferation  will 
eventually  be  activated,  and  becomes  a  steady 
state  for  at  least  another  10000  time  units. 

16 

!(!  PCCP53  U[12000]  PCCApoptosis) 

It  is  not  the  case  that,  within  12000  steps,  P53 
in  PCCs  has  to  have  a  low  concentration  level 
until  PCCs’  Apoptosis  being  triggered. 

Cell  functions  of  pancreatic  stellate  cells  j 

17 

FT  10001  (GflOOOOl  PSCActivation) 

Within  1000  time  units,  PSCs’  activation  (/ 
migration)  will  eventually  be  activated,  and 
becomes  a  steady  state  for  at  least  another 

10000  time  units. 

18 

F[1000]  (G[10000]  PSCMigration) 

19 

F[1000]  (PSCApoptosis  A 

FflOOO]  (G[10000]  0  PSCApoptosis))) 

Within  1000  time  units,  PSCs’  apoptosis  will  be 
triggered.  Then,  after  at  most  1000  time  units, 
the  initially  functional  apoptosis  in  PSCs  will  be 
inhibited  and  stay  in  inactive  status  for  at  least 
10000  time  units. 

20 

F[12000]  PSCProliferation 

Within  12000  steps,  PSCs’  proliferation  will 
eventually  be  triggered. 

!:  logical  not,  A:  logical  and,  F:  eventually,  G:  always,  U:  until 


Table  7.1:  System  properties  used  for  the  model  selection  using  statistical  model  checking 
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Chapter  8 

Conclusion  and  Future  Work 


In  computer  science,  formal  specification  and  analysis  methods  are  used  to  design  and  prove  prop¬ 
erties  of  computer  systems.  If  a  desired  property  of  a  computer  system  turns  out  to  fail,  then  we 
can  in  principle  adapt  and  refine  the  system  at  hand.  Orthogonal  to  this  usage  in  computer  science, 
for  biology  as  an  empirical  science,  where  biological  systems  are  a  fact  of  life,  formal  methods 
serve  to  better  understand  the  inner  workings  and  emergent  properties  of  such  systems.  This  thesis 
developed  new  modeling  language,  formal  analysis  methods,  and  models  for  biological  systems 
at  different  levels  -  the  molecular,  cellular,  tissue,  organ,  and  whole  organism  levels.  It  show  how 
the  study  of  biological  and  biomedical  systems  considering  nonlinearity,  nondeterminism,  and 
stochasticity  can  benefit  from  formal  modeling  formalisms  with  different  abstraction  levels  and 
model  checking  techniques. 

The  works  presented  in  this  thesis  can  be  classified  into  three  groups  according  to  three  research 
motivations.  The  first  group  is  motivated  by  the  study  of  pancreatic  cancer,  including  the  single 
cell  analysis  of  pancreatic  cancer  cells,  the  study  of  the  interplay  between  pancreatic  cancer  and 
stellate  cells,  and  the  development  of  a  framework  where  formal  methods  and  machine  learning 
algorithms  are  used  to  automate  the  model  construction  and  refinement  for  pancreatic  cancer. 

In  Chapter  2,  we  have  presented  and  formally  checked  an  in  silico  model  for  a  single  cell  of 
pancreatic  cancer.  The  model  incorporates  important  signaling  pathways  which  are  implicated  with 
high  frequency  in  pancreatic  cancer.  We  have  verified  temporal  logic  properties  encoding  behavior 
related  to  cell  fate,  cell  cycle,  and  oscillation  of  expression  level  in  key  proteins.  As  shown  above, 
the  model  agrees  well  qualitatively  with  experiments.  We  have  also  suggested  several  properties 
which  could  be  tested  by  future  experiments. 

Considering  that,  for  these  years,  due  to  the  poor  treatment  results  for  the  pancreatic  cancer,  the 
research  focus  has  been  shifted  from  solely  looking  into  pancreatic  cancer  cells  towards  investigat¬ 
ing  the  pancreatic  cancer  microenvironment.  So,  it  is  of  great  importance  and  interest  to  understand 
the  microenvironment.  In  Chapter  6,  we  have  presented  a  multicellular  and  multiscale  model  of 
the  PC  microenvironment.  The  model  is  formally  described  using  the  extended  BioNetGen  lan¬ 
guage,  which  can  capture  the  dynamics  of  multiscale  biological  systems  using  a  combination  of 
continuous  and  discrete  rules.  We  have  carried  out  stochastic  simulation  and  StatMC  to  analyze 
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system  behaviors  under  distinct  conditions.  Our  verification  results  have  confirmed  the  experimen¬ 
tal  findings  with  regard  to  the  mutual  promotion  between  PCCs  and  PSCs.  We  have  also  gained 
insights  on  how  existing  treatments  latching  onto  different  targets  can  lead  to  distinct  outcomes. 
These  results  have  demonstrated  that  our  model  might  be  used  as  a  prognostic  platform  to  identify 
new  drug  targets.  We  have  then  identified  four  potentially  (poly pharmacological  strategies  for 
depleting  PSCs  and  inhibiting  the  PC  development.  We  plan  to  test  our  predictions  empirically. 
Another  interesting  direction  is  to  extend  the  model  by  considering  spatial  information  ff82l  and 
TAMs  in  the  PC  microenvironment. 

From  the  above  two  works,  we  can  tell  that  constructing  and  analyzing  biological  models  can 
help  to  explain  systems  that  we  are  studying,  discover  new  questions,  and  even  challenge  existing 
findings.  However,  the  creation  of  models  often  relies  on  intense  human  effort.  This  results  in  a 
slow  development  of  models,  let  alone  extending  them  with  thousands  of  other  possible  compo¬ 
nent  interactions  that  reported  in  newly  published  literature.  Considering  this  situation,  there  is 
an  urgent  need  for  the  automation  of  information  extraction  from  literature,  smart  integration  into 
models,  and  efficient  and  correct  analysis  of  models,  so  as  to  allow  researchers  to  re-use  previously 
published  work,  in  a  comprehensive  and  timely  manner.  In  Chapter  7,  we  have  proposed  a  frame¬ 
work  that  utilizes  published  work  to  collect  extensions  for  existing  models,  and  then  analyzes  these 
extensions  using  stochastic  simulation  and  statistical  model  checking.  With  biological  properties 
being  formulated  as  temporal  logic,  model  checker  can  use  the  trace  generated  by  the  simulator  to 
estimate  the  probability  that  a  certain  property  holds.  This  gives  us  an  efficient  approach  (speed-up 
from  decades  to  hours)  to  re-use  previously  published  results  and  observations  for  the  purpose  of 
conducting  hundreds  of  in  silico  experiments  with  different  setups  (models).  Our  methods  and  the 
framework  that  we  have  developed  comprise  a  promising  new  approach  to  comprehensively  utilize 
published  work.  Moreover,  this  framework  can  also  be  used  to  search  for  pathways  or  interactions 
that  are  vital  to  certain  functions,  and  to  suggest  targets  for  drug  development.  For  example,  using 
the  ND  models  and  statistical  model  checker,  we  can  study  closely  how  each  layer  of  elements 
influences  the  elements  we  are  interested  in.  Then,  we  can  pin-point  the  models  that  satisfy  several 
properties  that  we  desire,  and  we  should  be  able  to  identify  a  few  candidates  that  play  important 
roles  in  the  regulation.  Or,  by  using  NI  method,  we  can  further  observe  whether  there  is  actually  an 
upstream  network  that  controls  the  behavior  of  the  elements.  This  gives  us  a  deeper  understanding 
of  the  network  and  helps  us  in  further  model  development. 

The  second  group  is  looking  into  the  bounded  reachability  problems  for  hybrid  systems  and 
stochastic  hybrid  systems  that  are  widely  used  to  model  biological  systems.  In  detail,  in  Chapter 
4,  we  have  studied  a  novel  method  of  killing  bacteria  using  bacteriophage  instead  of  antibiotics. 
A  bacteriophage  can  be  engineered  to  include  code  for  proteins,  which  when  inside  bacteria  can 
get  activated  and  result  in  bacteria  killing.  Specifically,  in  this  work  we  studied  photosensitizing 
proteins,  those  that  produce  reactive  oxygen  species  (ROS)  when  exposed  to  light.  Excess  amounts 
of  ROS  result  in  cell  death.  We  created  a  hybrid  model  expressing  both  continuous  and  discrete 
dynamics.  We  defined  this  model  within  each  of  the  stages  that  bacteria  can  go  through,  and  used 
our  tool  (implemented  the  5-decisions  technique)  for  hybrid  system  reachability  analysis  to  define 
parameters  of  the  model  that  are  otherwise  hard  or  not  possible  to  be  found  in  experiments.  We 
were  especially  interested  in  the  timing  effects,  when  the  cells  should  be  exposed  to  light,  how  long 
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the  light  exposure  should  be,  and  how  long  it  takes  photosensitized  proteins  to  kill  bacteria  cells 
after  exposure  to  light.  Our  analysis  shows  that  the  timing  will  be  critical  if  this  treatment,  using 
bacteriophage  and  photosensitized  proteins,  is  used  for  killing  bacteria:  the  delay  in  exposure  to 
light  can  significantly  delay  bacteria  killing  and  could  potentially  lead  to  complications  such  as 
sepsis;  and  the  duration  of  exposure  to  light  is  critical  -  turning  light  off  too  early  may  also  not 
result  in  killing.  Interestingly,  we  found  that  a  broader  range  of  SOX  could  kill  bacteria,  although 
the  time  to  reach  this  effect  may  again  be  too  long  for  practical  purposes.  We  noticed  that  very 
low  levels  of  SOX  are  efficient  in  bacteria  killing,  while  medium  levels  result  in  the  longest  time 
to  killing,  and  we  are  further  investigating  these  results,  as  they  point  to  potential  improvements  in 
our  model.  The  results  that  we  obtained  will  offer  hints  for  the  design  of  wet  lab  experiments. 

Randomness  happens  naturally  in  real-world  systems.  So,  To  be  able  to  analyze  biological  sys¬ 
tems  with  uncertainties,  in  Chapter  5,  we  have  developed  and  implemented  the  SReach  algorithm, 
which  solves  probabilistic  bounded  reachability  problems  for  two  classes  of  models  of  stochastic 
hybrid  systems.  The  first  one  is  (nonlinear)  hybrid  automata  with  parametric  uncertainty.  The 
second  one  is  probabilistic  hybrid  automata  with  additional  randomness  for  both  transition  proba¬ 
bilities  and  variable  resets.  Standard  approaches  to  reachability  problems  for  linear  hybrid  systems 
require  numerical  solutions  for  large  optimization  problems,  and  become  infeasible  for  systems 
involving  both  nonlinear  dynamics  over  the  reals  and  stochasticity.  SReach  encodes  stochastic 
information  by  using  a  set  of  introduced  random  variables,  and  combines  5-complete  decision 
procedures  and  statistical  tests  to  solve  5-reachability  problems  in  a  sound  manner.  Compared  to 
standard  simulation-based  methods,  it  supports  non-deterministic  branching,  increases  the  cover¬ 
age  of  simulation,  and  avoids  the  zero-crossing  problem.  To  demonstrate  SReach’s  applicability, 
it  has  been  used  to  analyze  three  representative  examples  -  a  prostate  cancer  treatment  model,  a 
cardiac  model,  and  a  model  of  the  tap  withdrawal  circuit  in  C.  elegans  -  and  other  benchmarks, 
which  are  currently  out  of  the  reach  of  other  formal  tools. 

In  the  third  thread,  as  discussed  in  Chapter  3,  we  have  developed  and  implemented  an  effi¬ 
cient  LTL  bounded  model  checking  algorithm  for  Qualitative  Networks,  which  extend  Boolean 
networks  by  using  discrete  variables  and  algebraic  functions  as  updating  functions.  Our  technique 
utilizes  the  unique  structure  of  Qualitative  Networks  to  construct  “decreasing  reachability  sets”. 
These  sets  form  part  of  a  compact  representation  of  paths  in  the  QN  and  lead  to  significant  accel¬ 
eration  in  an  implementation  of  bounded  model  checking.  We  find  the  experimental  results  very 
encouraging  especially  given  the  iterative  development  methodology  biologists  have  been  using 
when  employing  our  tool  BMA.  As  mentioned,  our  users  “try  out”  several  options  and  refine  them 
according  to  results  of  simulation  and  verification.  In  this  iterative  process  it  is  most  important  to 
be  able  to  give  fast  answers  to  queries  of  the  user.  Considering  the  speed-ups  afforded  by  this  new 
technique,  we  have  shown  that  model  checking  can  be  incorporated  into  the  workflow  of  using  our 
tools. 

The  work  we  have  presented  in  this  thesis  suggests  several  ideas  for  future  works. 

Stochastic  Hybrid  Systems  with  Stochastic  Differential  Equations  (SDEs).  In  Chapter  5,  we 
have  proposed  and  implemented  a  probabilistic  reachability  analysis  method  for  two  classes  of 
stochastic  hybrid  systems,  where  randomness  is  introduced  by  system  parameters  and  discrete 
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transitions.  However,  in  real-world  biological  systems,  many  processes  are  stochastic  intrinsically, 
such  as  species’  population  changes  in  ecosystems.  Their  dynamics  are  usually  modeled  by  SDEs. 
Thus,  one  future  topic  is  to  design  a  model  checking  technique  for  general  stochastic  hybrid  sys¬ 
tems  (GSHSs)  where,  besides  probabilistic  transitions,  stochastic  differential  equations  are  used 
to  capture  continuous  dynamics.  One  approach  may  be  to  introduce  a  new  quantifier  symbol  for 
random  variables  and  SDE  constraints  for  stochastic  processes,  where  a  new  SDE  solver,  which 
make  use  of  numerical  solutions  to  SDEs  and  simulation-based  methods  estimating  distributions 
of  hitting  times  for  stochastic  processes,  can  be  integrated  into  existing  nonlinear  SMT  solvers. 

Considering  Multiple  Types  of  Existing  Knowledge  for  the  Automated  Model  Construction 
Framework.  Following  our  work  in  Chapter  7,  several  works  can  done  to  make  the  framework 
more  general  and  robust,  including  designing  the  way  to  consider  the  contradictions,  implementing 
the  detection  of  causal  relations  causing  the  failure  of  satisfying  a  certain  system  property,  and 
adding  more  information  into  the  extended  model,  for  example,  the  probability  for  a  relation  to 
exist.  Besides  these,  one  future  topic  is  to  integrate  more  types  of  existing  knowledge,  such  as 
time  series  datasets,  small  models  published  by  others,  and  images  describing  related  models.  To 
achieve  this,  appropriate  machine  learning  algorithms  will  be  needed  to  learn  model  (fragments) 
from  distinct  types  of  biological  knowledge.  Figure  [8J] offers  a  schematic  view  of  a  more  general 
framework  where  formal  methods  and  machine  learning  can  take  joint  efforts  to  automate  the 
model  design  for  biological  models. 

Formal  Analysis  Framework  for  Models  using  Various  Modeling  Fanguages.  In  reality,  exe¬ 
cutable  models  are  constructed  with  different  levels  of  details.  Which  abstraction  level  to  choose  is 
mainly  decided  considering  how  much  we  know  about  a  certain  system.  For  example,  since  a  large 
amount  of  work  has  been  carried  out  to  study  the  Ras  signaling  pathway  in  pancreatic  cancer  cells 
due  to  its  importance  in  the  cancer  development,  rule-based  models  are  usually  used  to  capture  the 
reactions  among  involved  proteins  in  detail.  While,  for  the  signaling  interactions  between  pancre¬ 
atic  stellate  cells  and  tumor- associated  macrophages,  there  has  not  yet  been  as  much  work  so  far 
as  people  recently  realized  their  interactions  may  be  critical  during  the  cancer  progress  and  one 
reason  for  the  poor  prognosis  of  pancreatic  cancer.  Thus,  logical  models  are  used  to  only  describe 
the  structure  of  the  interacting  network.  Currently,  different  analysis  frameworks  are  adopted  for 
distinct  types  of  models  that  are  used  to  describe  different  parts  of  a  biological  system.  While,  to 
throughly  study  a  biological  system,  it  is  important  to  be  able  to  analyze  the  system  as  a  whole.  So, 
we  really  need  a  formal  analysis  framework  which  allows  to  put  models  using  various  modeling 
languages  together  and  offer  a  general  way  to  analyze  them. 
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Figure  8.1:  Schematic  view  of  how  formal  methods  and  machine  learning  can  take  joint  efforts  to 
automate  the  model  design  for  biological  models. 
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